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Abstract — This paper studies the minimum achievable source 
coding rate as a function of blocklength n and probability e that 
the distortion exceeds a given level d. Tight general achievability 
and converse bounds are derived that hold at arbitrary fixed 
blocklength. For stationary memoryless sources with separable 
distortion, the minimum r ate ac hievable is shown to be closely 

approximated by R{d) + \j^^Q~^ (e), where R{d) is the rate- 
distortion function, V{d) is the rate dispersion, a characteristic of 
the source which measures its stochastic variability, and (■) 
is the inverse of the standard Gaussian complementary cdf. 

Index Terms — achievability, converse, finite blocklength 
regime, lossy source coding, memoryless sources, rate-distortion. 
Shannon theory. 

I. Introduction 

The rate-distortion function characterizes the minimal 
source coding rate compatible with a given distortion level, 
either in average or excess distortion sense, provided that the 
blocklength is permitted to grow without limit. However, in 
some applications relatively short blocklengths are common 
both due to delay and complexity constraints. It is therefore 
of critical practical interest to assess the unavoidable penalty 
over the rate-distortion function required to sustain the desired 
fidelity at a given fixed blocklength. Neither the lossy source 
coding theorem nor the reliability function, which gives the 
asymptotic exponential decay of the probability of exceeding a 
given distortion level when compressing at a fixed rate, provide 
an answer to that question. 

This paper presents new achievability and converse bounds 
to the minimum sustainable rate as a function of blocklength 
and excess probability, valid for general sources and general 
distortion measures. In addition, for stationary memoryless 
sources with separable (i.e., additive, or per-letter) distortion, 
we show that the finite blocklength coding rate is well approx- 
imated by 



e) w R{d) 



V{d) 



(1) 



where n is the blocklength, e is the probability that the 
distortion incurred by the reproduction exceeds d, and V{d) is 
the rate-dispersion function. The evaluation of the new bounds 
is detailed for: 

• the stationary discrete memoryless source (DMS) with 
symbol error rate distortion; 
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• the stationary Gaussian memoryless source (GMS) with 
mean-square error distortion; 

• the stationary binary memoryless source when the com- 
pressor observes it through the binary erasure channel 
(BES), and the distortion measure is bit error rate. 

In the most basic special case, namely that of the equiprobable 
source with symbol error rate distortion, the rate-dispersion 
function is zero, and the finite blocklength coding rate is 
approximated by 



R{n, d, e) = R{d) + + o(- 

In \n 



(2) 



Section |ll] sets up the problem, introduces the definitions 
of the fundamental finite blocklengths limits and presents 
the basic notation and properties of the information density 
and related quantities used throughout the paper Section |lll] 
reviews the few existing finite blocklength achievability and 
converse bounds for lossy compression, as well as various 
relevant asymptotic refinements of Shannon's lossy source 
coding theorem. Section |IV] shows the new general upper and 
lower bounds to the minimum rate at a given blocklength. 
Section rv] studies the asymptotic behavior of the bounds using 
Gaussian approximation analysis. Sections [Vl] IVIII I VIIII and 
HXl focus on the binary memoryless source (BMSj 
BES and GMS, respectively. 



DMS, 



II. Preliminaries 

A. Operational definitions 

In fixed-length lossy compression, the output of a gen- 
eral source with alphabet A and source distribution Px is 
mapped to one of the M codewords from the reproduction 
alphabet B. A lossy code is a (possibly randomized) pair of 
mappings f: A ^ {1,...,M} and c: {!,..., M} H> B. 
A distortion measure d: A x B i— > [0,+oo] is used to 
quantify the performance of a lossy code. Given decoder 
c, the best encoder simply maps the source output to the 
closest (in the sense of the distortion measure) codeword, 
i.e. f(a;) — argmirim (i(a;, c(r7i)). The average distortion over 
the source statistics is a popular performance criterion. A 
stronger criterion is also used, namely, the probability of 
exceeding a given distortion level (called excess-distortion 
probability). The following definitions abide by the excess 
distortion criterion. 

'Although the results in Section [VTl are a special case of those in Section 
IVIII it is enlightening to specialize our results to the simplest possible setting. 
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Definition 1. An {M,d,e) code for {A, B, Px, d: A x 

B I— [0,+oo]} is a code with |f| = M such that 
P[(i(X,c(f(X))) > d] < e. 

The minimum achievable code size at excess-distortion 
probability e and distortion d is defined by 



M*{d, e) = min {M : 3{M, d, e) code} 



(3) 



Note that the special case d = and d(x, y) ^ 1 {x y} 
corresponds to almost-lossless compression. 

Definition 2. In the conventional fixed-to-fixed (or block) 
setting in which A and B are the n~fold Cartesian 
products of alphabets A and B, an (M, d, e) code for 
{^", S", Px", d": X 6" ^ [0,+oo]} is called an 
{n, M, d, e) code. 

Fix e, d and blocklength n. The minimum achievable code 
size and the finite blocklength rate-distortion function ( excess 
distortion) are defined by, respectively 

M*{n, d, e) = min {M : 3{n, M, d, e) code} (4) 

R{n,d,e) = - log M* (n, d, e) (5) 
n 

Alternatively, using an average distortion criterion, we em- 
ploy the following notations. 

Definition 3. An {M,d) code for {A, B, Px, d: A x 

B I— > [0,+oo]} is a code with |f| = M such that 
E[d{X,c{f(X)))] < d. The minimum achievable code size 
at average distortion d is defined by 



M*{d) = min {M : 3{M, d) code} 



(6) 



Definition 4. If A and B are the n—fold Cartesian 
products of alphabets A and B, an (M, d) code for 
{^", B'\ Px", d": y^" X 6" ^ [0,+oo]} is called an 
(n, M, d) code. 

Fix d and blocklength n. The minimum achievable code 
size and the finite blocklength rate-distortion function (average 
distortion) are defined by, respectively 

M*(n, d) = min {M : 3{n, M, d) code} (7) 
log M*{n,d) 



R{n,d) = 



(8) 



In the limit of long blocklengths, the minimum achievable 
rate is characterized by the rate-distortion function JT] ||2l- 

Definition 5. The rate-distortion function is defined as 

R{d) = lim sup R{n, d) (9) 

In a similar manner, one can define the distortion-rate 
functions D{n,R,e), D{n,R) and D{R). 

In the review of prior work in Section |lll] we will use 
the following concepts related to variable-length coding. A 
variable-length code is a pair of mappings f: A i-^ {0,1}* 
and c: {0, 1}* i— >■ B, where {0, 1}* is the set of all possibly 
empty binary strings. It is said to operate at distortion level d if 
P[d(X,c(f(X))) < d] = 1. For a given code (f,c) operating 
at distortion d, the length of the binary codeword assigned to 
X ^ A is denoted by £{x) = length of f(x). 



B. Tdted information 
Denote by 



ix-Y{x\y) = log- 



dP 



XY 



-(a;,y) 



d(Px X Py) ' 

the information density of the joint distribution Pxy at 
[x, y) E Ax B. Further, for a discrete random variable X, the 
information in outcome x is denoted by 

IX {x) = log ^ (11) 
Px[x) 

Under appropriate conditions, the number of bits that it takes 
to represent x divided by tx{x) converges to 1 as these 
quantities go to infinity. Note that if X is discrete, then 
ix:x{x;x) = ixix). 

For a given Px and distortion measure, denote 



Rxid) = 



inf I{X;Y) 

E[d(X,Y)]<d 



(12) 



We impose the following basic restrictions on the source and 
the distortion measure. 

(a) Mx(d) is finite for some d, i.e. dmin < oo, where 

dmin = inf{d: Mx(d) < oo} (13) 

(b) The distortion measure is such that there exists a finite set 
E C B such that 



E 



mind{X, y) 



< oo 



(14) 
and 



(c) The infimum in ( fT2] l is achieved by a unique Py^x 
distortion measure is finite-valued. 
The counterpart of (fTTl i in lossy data compression, which 
roughly corresponds to the number of bits one needs to spend 
to encode x within distortion d, is the following. 

Definition 6 (d— tilted information). For d > dmin, the 
d— tilted information in x is defined as 

1 



jx{x,d) = log 



E[exp{A*d- A*d(a;,y*)}] 



(15) 



where the expectation is with respect to the unconditional 
distributiot^ of Y*, and 



X* = -Rxid) 



(16) 



It can be shown that ^ guarantees differentiability of 
Mx(d), thus (fTsT i is well defined. A measure-theoretic proof 
of the following properties can be found in [|3j Lemma 1.4]. 

Property 1. For Py -almost every y, 

jx{x,d) ^ tX;Y*ix;y) + yd{x,y) ~ X*d (17) 
hence the name we adopted in Definition |6] and 

Rx{d)=E[jxiX,d)] (18) 

^Restriction (c) is imposed for clarity of presentation. We will show in 
Section |V] that it can be dispensed with. 

^^Henceforth, Y* denotes the rate-distortion-achieving reproduction random 
variable at distortion d, i.e. its distribution Py is the marginal of Py\x^^' 



where Py^x ^^^^'^^^ 'he infimum in I12t . 
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Property 2. For all y ^ B, 

E[exp{X*d-yd{X,y)+jx{X,d)}] < 1 (19) 
with equality for P^-almost every y. 

Remark I. While Definition |6] does not cover the case d = 
rfmin, for discrete random variables with d{x,y) — l{x y} 
it is natural to define 0-tilted information as 



jxix,0) = ix{x) 



(20) 



Example 1. For the BMS with bias p < h ™d bit error rate 



distortion. 



jxAx'\d)=txAxn~nh{d) 



(21) 



if < d < p, and if d > j3. 

Example 2. For the GMS with variance cr^ and mean-square 
error distortion^ 



jx^ix",d) = - log— + ( ^ 



2 d \ a 

if < d < 0-2, and if d > cr^ 

The distortion d-ball around x is denoted by 

Bd{x) = {yeB: d{x,y) < d} 



, ^^^^ 
n I (22) 



(23) 



Tilted information is closely related to the (unconditional) 
probability that Y* falls within distortion d from X. Indeed, 
since A* > 0, for an arbitrary Py we have by Markov's 
inequality, 

PY{Bd{x))=F[d{x,Y)<d] (24) 
<E[exp{X*d- yd{x,Y)}] (25) 

where the probability measure is generated by the uncondi- 
tional distribution of Y. Thus 



log — 



1 



PyiMx)) 



> Jxix,d) 



(26) 



As we will see in Theorem |6] under certain regularity condi- 
tions the equality in ( |26] ) can be closely approached. 



C. Generalized tilted information 

Often it is more convenient |4| to fix Py defined on B 
and to consider, in lieu of (fT2l i. the following optimization 
problem: 

RxY{d)= min D{Pz\x\\Py\Px) (27) 

Pz\x '■ 
V.[d(X-Z)]<d 

In parallel with Definition |6] define for any A > 

Ky {x,\)^ log ,,,,, (28) 



E [exp {\d - Xd{x, Y))] 
As long as d > dmin|x,y^ where 

rfmin|x,y - inf {d: Rx^vid) < oo} (29) 

'^We denote the Euclidean norai by | ■ |, i.e. \x"\'^ = + . . . + x^. 



the minimum in dZTl l is always achieved by a that 
satisfies li3J 



Ids 



dP: 



z*\x 



{y\x) 



log- 



dPY*{y) 

exp {-X\ yd{x,y)) 



exp (^"A^f yd(a;,y)^ 
Ay (a;, Ax v) - \*x^Yd{x, y) + X*xvd 



where 



^X,Y — 



^'x,Yid) 



(30) 
(31) 
(32) 



III. Prior work 

In this section, we summarize the main available bounds on 
the fixed-blocklength fundamental limits of lossy compression 
and we review the main relevant asymptotic refinements to 
Shannon's lossy source coding theorem. 

A. Achievability bounds 

Returning to the general setup of Definition [T] the basic 
general achievability result can be distilled li5J from Shannon's 
coding theorem for memoryless sources: 

Theorem 1 (Achievability, Q ||5l). Fix Px, a positive integer 
M and d > dmin- There exists an (M, d, e) code such that 



e < inf 

Pyix 



' [d (X, Y) > d] 



+ inf |P [ix;Y {X; Y) > log M ~ j] + } | 



(33) 



Theorem [T] is the most general existing achievability result 
(i.e. existence result of a code with a guaranteed upper bound 
on error probability). In particular, it allows us to deduce that 
for stationary memoryless sources with separable distortion 
measure, i.e. when Px^ = i-^ x . . . x F^, d(a;",?/") = 
7i EiLi d{xt,yi), it holds that 

limsupi?(n,d) < Rx(d) (34) 



limsupi?(n,d,e) < Rx(d) 



(35) 



where Mx((i) is defined in ( fT2] i. and < e < 1. 

For three particular setups of i.i.d. sources with separa- 
ble distortion measure, we can cite the achievability bounds 
of Goblick [6] (fixed-rate compression of a finite alphabet 
source), Pinkston Q (variable-rate compression of a finite- 
alphabet source) and Sakrison fSl (variable-rate compression 
of a Gaussian source with mean-square error distortion). 
Sakrison's achievability bound is summarized below as the 
least cumbersome of the aforementioned: 

Theorem 2 (Achievability, fSl). Fix blocklength n, and let 
X^ be a Gaussian vector with independent components of 
variance a^. There exists a variable-length code achieving 
average mean-square error d such that 



1 



■ Ids 



1 



1 



l.2n 
51oge 



■ logn 



+ log 47r+ - lege , 

3 12 n 



1 



(36) 
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B. Converse bounds 

The basic converse used in conjunction with (|33] | to prove 
the rate-distortion fundamental limit with average distortion is 
the following simple result, which follows immediately from 
the data processing lemma for mutual information: 

Theorem 3 (Converse, f2 |). Fix Px, integer M and d > dmin- 
Any {M, d) code must satisfy 



M.x{d) < logM 
where Wx (d) '■s defined in (1 121 ). 



(37) 



Shannon |2 | showed that in the case of stationary memo- 
ryless sources with separable distortion, Mx"(rf) = nM.x{d). 
Using Theorem |3] it follows that for such sources. 



^x(rf) < Rin,d) 



(38) 



for any blocklength n and any d > dmin, which together with 
(34\ gives 

R{d) = Mx(d) (39) 

The strong converse for lossy source coding (|9], llTOl states 
that if the compression rate R is fixed and R < M.x{d), then 
e — > 1 as rt — >^ oo, which together with (l35l l yields that for 
i.i.d. sources with separable distortion and any < e < 1, 



limsup R{n, d, e) = Mx(d) = Rid) 



(40) 



For prefix-free variable-length lossy compression, the key 
non-asymptotic converse was obtained by Kontoyiannis ifTTI 
(see also llT2l for a lossless compression counterpart). 

Theorem 4 (Converse, |11|). Assume that the infimum in the 
right side of (I12l l is achieved by some conditional distribution 
Pp^x- V prefix-free variable-length code for Px operates at 
distortion level d, then for any 7 > 0, 



^[l{X)<jx{X,d)^-l]<2-^ 



(41) 



For DMS with finite alphabet and bounded separable dis- 
tortion measure, a finite blocklength converse can be distilled 
from Marton's fixed-rate lossy compression error exponent 
lEl: 

Theorem 5 (Converse, ifTSi ). Consider a DMS with finite 
input and reproduction alphabets, source distribution P and 
separable distortion measure with maxx miriy (i(x, y) — 0, 
Amax = niaxx,y (i(x, y) < +00. Fix < d < Amax- Let the 
corresponding rate-distortion and distortion-rate functions be 
denoted by Rp(d) and Dp(R), respectively. Fix an arbitrary 
(n, M, d, e) code. 



If the code rate R — satisfies 
R < Rp{d), 



(42) 



then the excess-distortion probability is bounded away 
from zero: 

Dp(R) - d 



e > 



A, 



If R satisfies 



Rp{d) < R < maxi?Q(d), 



(43) 



(44) 



where the maximization is over the set of all probability 
distributions on A, then 



e> sup (' ^q(^) -f -Q»(G., 



<5>0,Q \ Amax ^ d 

■exp{-n{D{Q\\P) + 5)), 



(45) 



where the supremization is over all probability distribu- 
tions on A satisfying Rqid) > R, and 



Gs,n ~ i X 



It turns out that the converse in Theorem |5] results in rather 
loose lower bounds on d, e) unless n is very large, in 
which case the rate-distortion function already gives a tight 
lower bound. Generalizations of the error exponent results in 
IJ3J are found in I.14J-I.18J. 



C. Gaussian Asymptotic Approximation 

The "lossy asymptotic equipartition property (AEP)" |[T9l, 
which leads to strong achievability and converse bounds for 
variable-rate quantization, is concerned with the almost sure 
asymptotic behavior of the distortion d— balls. Second-order 
refinements of the "lossy AEP" were studied in ifTTl . ||20l . 

Theorem 6 ("Lossy AEP"). For memoryless sources with sep- 
arable distortion measure satisfying the regularity restrictions 
(Ql-divll in Section 

1 " 1 

log p*^{^Bd{X^)) = ll^^^^^^d) + - \ogn + O (log log n) 

almost surely. 

Remark 2. Note the different behavior of almost lossless data 
compression: 

Kontoyiannis fTTl pioneered the second-order refinement of 
the variable-length rate-distortion function showing that for 
memoryless sources with separable distortion measures the 
optimum prefix-free description length at distortion level d 
satisfies 

= nR{d) + V^G„ + O (logn) a.s. (47) 

where G„ converges in distribution to a Gaussian random 
variable with zero mean and variance equal to the rate- 
dispersion function defined in Section [V] 

^The result of Theorem |6] was pointed out in fl 1.' Proposition 3] as a 
simple corollary to the analyses in 120 J , 121] , See 1,22] for a generalization to 



a-mixing sources. 
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D. Asymptotics of redundancy 

Considerable attention has been paid to the asymptotic 
behavior of the redundancy, i.e. the difference between the 
average distortion D{n, R) of the best n— dimensional quan- 
tizer and the distortion-rate function D{R). For finite-alphabet 
i.i.d. sources. Pile ||231 strengthened the positive lossy source 
coding theorem by showing that 



D{n,R)-D{R) < - 



dD{R) logn 
dR 2^ 



log 71 



(48) 



Zhang, Yang and Wei ll24l proved a converse to ( |48] l. thereby 
showing that for memoryless sources with finite alphabet. 



D{n,R)~D{R) 



dD{R) logn 
dR 2^ 



log n 



(49) 



Using a geometric approach akin to that of Sakrison fSl, 
Wyner ll25l showed that ( l48T l also holds for stationary Gaussian 
sources with mean-square error distortion, while Yang and 
Zhang [201 extended (|48T l to abstract alphabets. Note that as the 
average overhead over the distortion-rate function is dwarfed 
by its standard deviation, the analyses of 11201 , ||23l - l|25l are 
bound to be overly optimistic since they neglect the stochastic 
variability of the distortion. 



IV. New finite blocklength bounds 



In this section we give achievability and converse results for 
any source and any distortion measure according to the setup 
of Section nil When we apply these results in Sections |V]-IIXI 
the source X becomes an n— tuple [Xi, . . . , X„). 



A. Converse bounds 

Our first result is a general converse bound. 

Theorem 7 (Converse). Assume the basic conditions 

in Section IHI o.re met. Fix d > dmin- Any [M, d, e) code must 

satisfy 



t > sup {P [jx{X, d) > log A/ + 7] - exp(-7)} (50) 

7>0 



Qy denote the marginal of Py\zQz- We havqj, for any 7 > 

Pbx{X,d)>logM + j] 
= P [jx{X, d) > logM + 7, d{X, Y) > d] 
+ P [jx{X, d) >\ogM + 7, d{X, Y) < d] 

M 

xeA z=i 

■ ^y|z(yk)l{M<expOx(x,d)-7)} 

yeBa{x) 

< e + exp (-7) ^ Pxix) exp {jxix, d)) 



(51) 
(52) 

(53) 



xeA 



M 



= e + exp{~j)Y,Px{x)exp{jx{x,d))QY{Bd{x)) (55) 

xeA 

< e + exp(-7) QyIv) 

■ YPx{x)ex_p{X*d~X*dix,y)+jx{x,d)) (56) 

xeA 

< e + exp (-7) (57) 
where 

• ( |54l i follows by upper-bounding 

Pz\x{z\x)l{M < exp{jx{x,d) - 7)} 
exp (-7) 



< 



M 



■ex-p{]x{x,d)) 



(58) 



for every {x, z) e A x {1, . . . , M}, 
• ( |56] | uses (I25] ) particularized to Y distributed according 

to Qy, and 
. (|57li is due to ( fT9b . 

■ 

Remark 3. Theorem [7] gives a pleasing generalization of the 
almost-lossless data compression converse bound (21, ll26l 
Lemma 1.3.2]. In fact, skipping ( |56] |, the above proof applies 
to the case d — and d{x, y) = l{x ^ y} that corresponds 
to almost-lossless data compression. 

Remark 4. As explained in Appendix |C] condition (|c) can be 
dropped from the assumptions of Theorem Q 

Our next converse result, which is tighter than the one 
in Theorem [7] in some cases, is based on binary hypothe- 
sis testing. The optimal performance achievable among all 
randomized tests Pw\x '■ A ^ {0, 1} between probability 
distributions P and Q on A is denoted by (1 indicates that 
the test chooses P)0 



/3o(P,g)= min Q[M^ = 1] 

Pw I X '■ 

¥[W=l\>a 



(59) 



Proof: Let the encoder and decoder be the random 
transformations Pz\x and Py\z, where Z takes values in 
{1, . . . , M}. Let Qz be equiprobable on {1, . . . , Af}, and let 



We write summations over alphabets for simplicity. All our results in 
Sections llVI and Fvl hold for arbitrary probability spaces. 

'Throughout, P, Q denote distributions, whereas P, Q are used for the 
con'esponding probabilities of events on the underlying probabihty space. 
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Theorem 8 (Converse). Let Px be the source distribution 
defined on the alphabet A. Any (M, d, e) code must satisfy 



M > sup inf 



veB Q [diX, y) < d] 
where the supremum is over all distributions on A. 



(60) 



Proof: Let {Pz\x t Py\z) be an {M,d,e) code. Fix a 
distribution Q on A, and observe that W ~ 1 {d{X, Y) < d} 
defines a (not necessarily optimal) hypothesis test between Px 
and Q with F[W = l]>l-e. Thus, 

M 

< J2 Pz\x{m\x) PY\z{y\m)l{d{x,y) < d} 

xeA m=l yeB 

M 

< YT. PY\z{y\m) Y Qx{x)l{d{x, y) < d} (61) 

M 

< PY\z{y\m) sup Q [d{X, y) < d\ (62) 

m=lyeB 

(63) 



M sup Q [d{X, y) < d] 
yeB 



Suppose for a moment that X takes values on a finite 
alphabet, and let us further lower bound (|60) by taking Q 
to be the equiprobable distribution on A, Q ~ U. Consider 
the set C A that has total probability 1 — e and contains the 
most probable source outcomes, i.e. for any source outcome 
X G ri, there is no element outside fl having probability 
greater than Px{x). For any x ^ ft, the optimum binary 
hypothesis test (with error probability e) between Px and Q 
must choose Px- Thus the numerator of (|60] | evaluated with 
Q = U is proportional to the number of elements in il, while 
the denominator is proportional to the number of elements in 
a distortion ball of radius d. Therefore (|60) evaluated with 
Q ~ U yields a lower bound to the minimum number of d- 
balls required to cover 51. 

Remark 5. In general, the lower bound in Theorem |8] is not 
achievable due to overlaps between distortion d— balls that 
comprise the covering. One special case when it is in fact 
achievable is almost lossless data compression on a countable 
alphabet A. To encompass that case, it is convenient to relax 
the restriction in (|59) that requires Q to be a probability mea- 
sure and allow it to be a (T-finite measure, so that Pa{Px, Q) 
is no longer bounded by 1 Note that Theorem [8] would still 
hold. Letting U to be the counting measure on A (i.e. U 
assigns unit weight to each letter), we have (Appendix [Al l 



/3i_,(Px,C/) < M*(0,e) < /3i_,(Px,f/) + 1 



(64) 



The lower bound in ( l64l i is satisfied with equality whenever 
/3i_e(Px, U) is achieved by a non-randomized test. 

*The Neyman-Pearson lemma generalizes to cr-finite measures. 



B. Achievability bounds 

The following result gives an exact analysis of the excess 
probability of random coding, which holds in full generality. 

Theorem 9 (Exact performance of random coding). Denote by 
ed{ci, . . . , Cm) the probability of exceeding distortion level d 
achieved by the optimum encoder with codebook (ci, . . . , cm)- 
Let Yi , . . . , Ym be independent, distributed according to an 
arbitrary distribution on the reproduction alphabet Py- Then 

iM 



E [ea (Fi, . . . ,Ym)] = E [1 - 



(65) 



Proof: Upon observing the source output x, the optimum 
encoder chooses arbitrarily among the members of the set 

arg min d{x,Ci) 

i— 1,...,M 

The indicator function of the event that the distortion exceeds 
d is 

M 



1 i min d(x, Ci) > d 

i=l....,M 



\[l{d{x,c,)> d} (66) 



Averaging over both the input X and the choice of codewords 
chosen independently of X, we get 

\{l{d{X,Y,)> d) 



E 



E 



M 



\[l{d{X,Yi)>d)\X 



M 



= E ]J E [1 {d{X, Yi)>d} \X] 



(67) 



(68) 



= E(p[d(x,y) > d\x]) 



M 



(69) 
, Ym are 



where in ( IMb we have used the fact that Yi , , 
independent even when conditioned on X. ■ 
Invoking Shannon's random coding argument, the following 
achievability result follows immediately from Theorem |9] 

Theorem 10 (Achievability). There exists an {M, d, e) code 
with 

e<ME[l- PY{Bd{X))f' (70) 

Py 

where the infimization is over all random variables defined on 
B, independent of X. 

While the right side of (iTOl i gives the exact performance of 
random coding. Shannon's random coding bound (Theorem [T) 
was obtained by upper bounding the performance of random 
coding. As a consequence, the result in Theorem [TOl is tighter 
than Shannon's random coding bound (Theorem [U, but it is 
also harder to compute. 



Applying [1 — x 



,M 



< e 



-M: 



^ to (ITOI i. one obtains the 



following more numerically stable bound. 

Corollary 11 (Achievability). There exists an {M, d, e) code 
with 



e < inf I 

Py 



^-MPY{Bd(X)) 



(71) 



where the infimization is over all random variables defined on 
B, independent of X. 
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The last result in this section will come handy in the analysis 
of the bound in Theorem [TO] (see Section III-CI for related 
notation). 

Lemma 1. For an arbitrary Py on B, 
PY{Bd{x))> sup exp(-Ay(a;,A^ )-A^ 7 

P*,7>0 ^ ' ' 



B. Main result 



d - 7 < d{x,Z*) <d\X^x 



(72) 



where the supremization is over all on A such that 

Proof: We streamline the treatment in [20, (3.26)]. Fix 
7 > and distribution P^ on the input alphabet A. We have 

PriBdix)) 

veBi{x)\Ba—,{x) 



yeBd{x)\Bd-j{x) 

exp I 



(-Ay(x,A^^^)-A^^7) 
E Pz*\x=xiy) 

y&Bd{x)\Bd-y{x) 

= exp (^-Ay{x, X*j^ y^ - ^^ yt) 
P [d-7 < d{x,Z*) <d\X = x 

where (fTsT l holds because y ^ Bd f{x) implies 

Xd - Xd{x, y) - A7 < 
for all A > 0, and ( 1761 ) takes advantage of dSTT ). 



(75) 
(76) 

(77) 
(78) 



V. Gaussian approximation 
A. Rate-dispersion function 

In the spirit of ||27 l. we introduce the following definition. 



Definition 7. Fix d > rfmin- The rate-dispersion function 
( squared information units per source output) is defined as 

2 

(79) 



V{d) — limlimsupn 



R{n,d, e) - R{d) 



lim lim sup 

e ^-0 7-),— 



n{R{n,d, e)-R{d)y 



(80) 



21ogei 

Fix d, < e < 1, 77 > 0, and suppose the target is to sustain 
the probability of exceeding distortion d bounded by e at rate 
R = (1 + ri)R{d). As ([T]) implies, the required blocklength 
scales linearly with rate dispersion: 



n{d, 77, e) 



V{d) (Q-He) 



(81) 



i?2(d) V 77 

where note that only the first factor depends on the source, 
while the second depends only on the design specifications. 



In addition to the basic conditions of Section III-BI 

in the remainder of this section we impose the following 
restrictions on the source and on the distortion measure. 

(i) The source {X^} is stationary and memoryless, Px^ = 

Px X . . . X Px. 

(ii) The distortion measure is separable, = 
TiT.'Lid{.x„yi). 

(iii) The distortion level satisfies dmin < d < dmax, where 
dmin is defined in (fTsT l. and (i,„ax = infyee E [(i(X, y)], 
where averaging is with respect to the unconditional dis- 
tribution of X. The excess-distortion probability satisfies 
< e < 1. 

(iv) E [(i^(X,Y*)] < 00 where averaging is with respect to 
Fx X Py- 

The main result in this section is the following 

Tlieorem 12 (Gaussian approximation). Under restrictions 



R{n, d, e) = R{d) + ^J^^Q-^ (e) + 
y(d) =Var[jx(X,d)] 
and the remainder term in ( |82t satisfies 



log n 



2 n \n J 

log n log log n 



loe 



where 



< C 



C 



n 

01 i 

n 



Var [AV.(X, A*)] 



(82) 
(83) 

(84) 
(85) 

(86) 



2 E[|A;;4X,A*)|]loge 

In ( 1861 ), (•)' denotes differentiation with respect to X, Ay* (x, A) 
is defined in ( I28l l, and X* — —R'(d). 



Remark 6. Since the rate-distortion function can be expressed 
as (see ( fTSI l in Section Hill 



P(d)=Ebx(X,d)] 



(87) 



it is equal to the expectation of the random variable whose 
variance we take in dSSl l. thereby drawing a pleasing parallel 
with the channel coding results in |27|. 

Remark 7. For almost lossless data compression. Theorem [12] 
still holds as long as the random variable ix(X) has finite third 
moment. Moreover, using (l64t the upper bound in (l85i can be 
strengthened (Appendix IbJ to obtain for Var [7x(X)] > 



2 n 



(88) 



'Recently, using an approach based on typical sequences and error expo- 
nents, Ingber and Kochman 1 28 1 independently found the dispersion of finite 
alphabet sources. The Gaussian i.i.d. source with mean-square error distortion 
was treated separately in |28|. The result of Theorem 112 l is more general as 
it applies to sources with abstract alphabets. 
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which is consistent with the second-order refinement for 
almost lossless data compression developed in 129]. If 

Var [«x(X)] = 0, then 



i?(n,0,e) = H{X) 



1 , 1 
-log-; 

n 1 — 



where 



< o„ < 



exp(-niJ(X)) 



(89) 



(90) 



(1 - e)n 

As we wiU see in Section IVTl in contrast to the lossless case in 
(ISST l. the remainder term in the lossy case in (l82t can be strictly 
larger than — appearing in (ISST i even when V{d) > 0. 

Remark 8. As will become apparent in the proof of Theorem 
[T2l if V{d) = Q, the lower bound in ( |82] l can be strengthened 
non-asymptotically: 



Rin,d,e) > Rid) 



loa 



1 



(91) 



which aligns nicely with 

Remark 9. Let us consider what happens if we drop restriction 
(jcj of Section III-BI that R{d) is achieved by the unique 
conditional distribution ^^Yjx- If several Py|x achieve R{d), 
writing jx-y (x, d) for the d— tilted information corresponding 
to Y, Theorem [T2] still holds with 



V{d) 



minVar [jx-Y{X,d)] 
max Var [jx;Y(X,(i)] 



0<6<i 
i < e < 1 



(92) 



where the optimization is performed over all Py|x that achieve 
the rate-distortion function. Moreover, as explained in Ap- 
pendix O Theorem [7] and the converse part of Theorem [12] 
do not even require existence of a minimizing Py^^- 

Let us consider three special cases where V{d) is constant 
as a function of d. 

a) Zero dispersion. For a particular value of d, V{d) = 
if and only if jx(X,(i) is deterministic with probability 1. In 
particular, for finite alphabet sources, V{d) if the source 
distribution Px maximizes Mx(rf) over all source distributions 
defined on the same alphabet ll28l . Moreover, Dembo and 
Kontoyiannis l30l showed that under mild conditions, the 
rate-dispersion function can only vanish for at most finitely 
many distortion levels d unless the source is equiprobable 
and the distortion matrix is symmetric with rows that are 
permutations of one another, in which case V{d) = for 

all d e (dmin,rfmax)- 

b) Binary source with bit error rate distortion. Plugging 
n = 1 into ( l2n i. we observe that the rate-dispersion function 
reduces to the varentropy [5 | of the source. 



=nO) = Var[^x(X)] 



(93) 



c) Gaussian source with mean-square error distortion. Plug- 
ging n = 1 into (|22]|. we see that 



1 



V{d) = -log^e 



(94) 



for all < d < cr^. Similar to the BMS case, the rate 
dispersion is equal to the variance of log/x(X), where /x(X) 
is the Gaussian probability density function. 



C. Proof of Theorem \12\ 

Before we proceed to proving Theorem [12] we state two 
auxiliary results. The first is an important tool in the Gaussian 
approximation analysis of R{n,d, e). 

Theorem 13 (Berry-Esseen CLT, e.g. [31 Ch. XVI.5 Theorem 

2] ). Fix a positive integer n. Let Zi, i — be 
independent. Then, for any real t 



Zi > n I fin + — 



-Q{t) 



where 



1 

/i„ -yE[z,] 

n ^-^ 

1=1 

n 

K = - VVar {Z,\ 
n ^-^ 

1=1 

1 " 



1=1 
T 

^ n 
'3/2 



V, 



< (95) 

(96) 

(97) 

(98) 
(99) 



The second auxiliary result, proven in Appendix [D] is 
a nonasymptotic refinement of the lossy AEP (Theorem [6]l 
tailored to our purposes. 

Lemma 2. Under restrictions there exist constants 

no,c, K > such that for all n > uq, 



Ids 



1 



Py.*(Bd(X«)) 

K 

where C is given by 



< y jx(^»,d) +Clogn + i 



> 1 



(100) 



We start with the converse part. Note that for the converse, 
restriction ([iv]l can be replaced by the following weaker one: 

(iv') The random variable jx(X, rf) has finite absolute third 
moment. 

To verify that ([iv} implies (iv'), observe that by the concavity 
of the logarithm. 



< jx(x, d) + X*d < X*E [d(x, Y*)] 



(101) 



so 



E 



bx(X,d) + A*d|' <X*^E[d^{X,Y*)] (102) 



Proof of the converse part of Theorem \12\ First, observe 
that due to ^ and ©, P^„ = x . . . x P*, and the d-tilted 
information single-letterizes, that is, for a.e. x". 



3x 



.(a;",d) = ^]x{xr,d) 



(103) 



Consider the case V{d) > 0, so that P„ in ( i99] l with Zi ~ 
3x{Xi, d) is finite by restriction (iv'). Let 7 = | logn in ( iSOl l. 
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and choose 

logAf 



iR{d) + ^nV{d)Q'^ (e„) - 7 



e + exp(— 7) + 



(104) 
(105) 



so that R — IhsJM. can be written as the right side of 
with (Hill satisfied. Substituting ( I103l l and (1104b in dSOjl, we 
conclude that for any (Af, d, e') code it must hold that 



Y,]x{X,,d) > nR{d) + ^nV{d)Q'^ (e„) 



i=l 



■ exp(-7) 



(106) 



The proof for V{d) > is complete upon noting that the 
right side of ( IIO6I 1 is lower bounded by e by the Berry-Esseen 
inequality ( |95l l in view of ( llOSI l. 

If V{d) = 0, it follows that jx{X,d) = R{d) almost surely. 
Choosing 7 — log and log A/ = nR{d) — 7 in ( |50l ) it is 
obvious that e' > e. ■ 
Proof of the achievability part of Theorem \12\ The proof 
consists of the asymptotic analysis of the bound in Corollary 
nn using Lemma |2] Denote 



G„ = logA/ - ^jx{xi,d) - Clog) 



(107) 



where constants c and C were defined in Lemma |2] Letting 
X = X" in ( TtT] ) and weakening the right side of dTTT l by 
choosing Py ~ Pyn = Py x . . . x Py , we conclude that there 
exists an {n, M, d, e') code with 



e' < E 
< E 

= E 

+ E 



-MP*„(B<i(X")) 



- cxp(G„ 



K 



(108) 
(109) 



g-oxp(G„)^ Ig^ <l0g 



,-oxp(G„)-^ I > 1q 



log^ 
2 

loggTl 



if 



(110) 



< 



Gn < log 



lege n 



1 



G„ > log 



log^n 



K 

7Ti 



(111) 



where ( |109l l holds for n > no by Lemma |2] and dl 1 II ) follows 
by upper bounding e~*^'^P('^"^ by 1 and respectively. We 
need to show that ( 111 11 1 is upper bounded by e for some R = 

with the remainder satisfying 



l2sM. that can be written as 



( [83] l. Considering first the case V{d) > 0, let 



log A/ = nR{d) + y/nV{d)Q-^ (e„) 
+ C log n + log + c 

Bn + K+l 



(112) 
(113) 



where i?„ is given by (|99] l and is finite by restriction (iv'). 
Substituting (II 12l i into (II lib and applying the Berry-Esseen 



inequality ( |95b to the first term in (II 1 lb . we conclude that 
e' < e for all n such that e„ > 0. 

It remains to tackle the case V{d) — 0, which imphes 
jx(X, d) = R{d) almost surely. Let 

log M = nR{d) + C log n + c + log log^ — 3_ (i 14) 

e — T= 



Substituting M into ( |109b we obtain immediately that e' < e, 
as desired. ■ 

D. Distortion-dispersion function 

One can also consider the related problem of finding the 
minimum excess distortion D{n, R, e) achievable at block- 
length n, rate R and excess-distortion probability e. We define 
the distortion-dispersion function at rate R by 



V(i?) = limlimsup 



n{D{n,R,e)- D{R)y 
21oge^ 



(115) 



For a fixed n and e, the functions R{n,-,e) and D{n,-,e) 
are functional inverses of each other Consequently, the rate- 
dispersion and the distortion-dispersion functions also define 
each other Under mild conditions, it is easy to find one from 
the other: 

Theorem 14. (Distortion dispersion) If R{d) is twice differen- 
tiable, R' {d) 7^ and V{d) is differentiable in some interval 
{d, d] C (dmin, cJmax] then for any rate R such that R = R{d) 
for some d € (d, d) the distortion-dispersion function is given 
by 



V{R) = {D\R)YV{D{R)) 



(116) 



and 



D{n,R,e) = D{R) 

V n \ i 

(ll7) 

where 6{-) satisfies ( |84] |, dSST l. 

Proof: Appendix |E] ■ 
In parallel to dsTb , suppose that the goal is to compress 
at rate R while exceeding distortion d = (1 + rj)D{R) with 
probability not higher than e. As ( II 171 ) implies, the required 
blocklength scales linearly with the distortion-dispersion func- 
tion: 



n(R,'i],e) 



D^{R) 



V 



(118) 



The distortion-dispersion function assumes a particularly 
simple form for the Gaussian memoryless source with mean- 
square error distortion, in which case for any < d < cr^ 



D{R) = (j^exp{-2R) 
V(i?) 



D^R) 
n{R,'i],e) 



= 2 



(119) 
(120) 

(121) 



so in the Gaussian case, the required blocklength is essentially 
independent of the target distortion. 
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VI. Binary memoryless source 

This section particularizes the nonasymptotic bounds in 
Section |IV] and the asymptotic analysis in Section [V] to 
the stationary binary memoryless source with bit error rate 
distortion measure, i.e. d(a;",2/") = i Y^^i ^ i^i Vi)- ^'^^ 
convenience, we denote 



k 

E 

J=0 



(122) 



with the convention ^ ^ ) 

k > n. 



Oiffc<Oand(^)^(::)if 



A. Equiprobable BMS (EBMS) 

The following results pertain to the i.i.d. binary equiproba- 
ble source and hold for < d < ■t^, < e < 1. 

Particularizing (l2ll to the equiprobable case, one observes 
that for all binary n— strings x" 

.7X" ix'\d) = n log 2 - nh{d) = nR{d) (123) 

Then, Theorem |7] reduces to (|9T1 l. Theorem |8] leads to the 
following stronger converse result. 

Theorem 15 (Converse, EBMS). Any (n, M, d, e) code must 
satisfy: 

n 
[nd\ 

Proof: Invoking Theorem [8] with the n— dimensional 
source distribution playing the role of Px therein, we have 

l3l-eiPx^,Q) 



e > 1 - M2- 



(124) 



M > sup inf ^r,/,^ X ,1 



> inf 



y"e{o,i}" PM(^",2/") < d] 
1 -e 



'[d(X",0) < d] 
1 - e 



(125) 
(126) 
(127) 
(128) 



[nd] 

where (1 1261 1 is obtained by substitution Q — Px- 



Theorem 16 (Exact performance of random coding, EBMS). 
The minimal averaged probability that bit error rate exceeds 
d achieved by random coding with M codewords is 

minE [ed (Fi, . . . , Ym)] = (^l - 2"" ^^^^ (129) 

attained by Py equiprobable on {0, 1}". 

Proof For all M > 1, (1 ^ z)^^ is a convex function of 
z on < z < 1, so the right side of (l65T l is lower bounded by 
Jensen's inequality: 

E[l-PYniBd{X^))f > (l-E[Py.(i3,(X"))])*^ 

(130) 

Equality in ( 11301 ) is attained by F" equiprobable on {0, 1}", 
because then 



Py.(Bd(X"))=2-"^ ^^^j ) a.s. 



(131) 



Theorem[T6]leads to an achievability bound since there must 
exist an {M, d, E [cd {Yi, . . . , yA/)]) code. 

Corollary 17 (AchievabiUty, EBMS). There exists an 
(n, M, d, e) code such that 



< 1 - 2-" 



n 
lnd\ 



M 



(132) 



As mentioned in Section [V] after Theorem [12] the EBMS 
with bit error rate distortion has zero rate-dispersion function 
for all d. The asymptotic analysis of the bounds in (1132b and 
(1124b allows for the following more accurate characterization 

of R{n, d, e). 

Theorem 18 (Gaussian approximation, EBMS). The minimum 
achievable rate at blocklength n satisfies 



R{n,d,e) = \og2-h{d) + \^^ + o(- 

2 n \n 

ifO<d< i, and 

i?(n,0,e) = log2~ ilog-^ 
n 1 — 



1 — e 



(133) 



(134) 



where < o„ < 



Proof: Appendix |F] 
A numerical comparison of the achievability bound 
evaluated with stationary memoryless Py^ix", the new 
bounds in (1132b and (1124b as well as the approximation in 
(1133b neglecting the O (^) term is presented in Fig. [T] Note 
that Marton's converse (Theorem |5} is not applicable to the 
EBMS because the region in ( l44b is empty. The achievability 
bound in ( l33T l. while asymptotically optimal, is quite loose in 
the displayed region of blocklengths. The converse bound in 
(1124b and the achievability bound in ( 1132b tightly sandwich 
the finite blocklength fundamental limit. Furthermore, the 
approximation in ( 1133b is quite accurate, although somewhat 
optimistic, for all but very small blocklengths. 

B. Non-equiprobable BMS 

The results in this subsection focus on the i.i.d. binary 
memoryless source with P [X = 1] = p < ^ and apply for 
0<d<p, 0<e<l. The following converse result is a 
simple calculation of the bound in Theorem [T] using ( l2Tb . 

Theorem 19 (Converse, BMS). For any in, M, d, e) code, it 
holds that 

e > sup {P [gn{Z) > \ogM + 7] - exp (-7)} (135) 

7>0 

5„(Z) =Zlogi + (n-Z)log-^ nh{d) (136) 

P 1 

where Z is binomial with success probability p and n degrees 
of freedom. 

An application of Theorem |8] to the specific case of non- 
equiprobable BMS yields the following converse bound: 



□ 
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Fig. 1. Bounds to R{n,d,e) and Gaussian approximation for EBMS, d - 
0.11, e = 10-2. 



Theorem 20 (Converse, BMS). Any (n, M, d, e) code must 
satisfy 



M > 



(")+«Ui) 



[nd] 



where we have denoted the integer 

r* = inax|r: (t) ^ 
[ fc=0 ^ ^ 

and a S [0,1) is the solution to 



-k 



< 1-e 



(137) 



(138) 



k=0 

= 1-e 



n 

r* + 1 



(139) 



Proof: In Theorem [8] the n— dimensional source distri- 
bution Px" plays the role of Px, and we make the possibly 
suboptimal choice Q — U, the equiprobable distribution on 
A — {0,1}''\ The optimal randomized test to decide between 
Px" and U is given by 

{0, |x"|>r* + l 
1, \x^\<r* (140) 
a, |a;"|=r* + l 

where denotes the Hamming weight of x", and a is such 
that j:..^APi^")Pw\x(M^n = 1 - e, so 



mm 

T.^^eA P(x'^)Pw\K(l\x")>l-e 



= 2- 







ir* ) 


^"(r* + l) 







2"" E Pw\x{i\xn 

(141) 



The result is now immediate from (l60t . ■ 
An application of Theorem[TO]to the non-equiprobable BMS 
yields the following achievability bound: 

Theorem 21 (Achievability, BMS). There exists an 
(n, M, d, e) code with 

M 



e < 



k=Q 

where 
and 



l-^i„(fc,t)q*(l-g)"- 



t=0 



p 



Ln{k, t) — 



1 - 2d 
k 



n — k 

t-to 



(142) 
(143) 

(144) 



with to = p+fc-"'^ ]^ i/t-nd <k< t+nd, andL„{k,t) = 
otherwise. 

Proof: We compute an upper bound to ( |70] l for the 
specific case of the BMS. Let Pyn = Py x . . . x Py, 
where Py{1) — q. Note that Py is the marginal the the 
joint distribution that achieves the rate-distortion function (e.g. 
1321 ). The number of binary strings of Hamming weight t 
that lie within Hamming distance nd from a given string of 
Hamming weight k is 



E 



1 — k\ f k 
t-i)-\to 



n — k 



(145) 



as long as t — nd < k < t + nd and is otherwise. It follows 
that if x" has Hamming weight fc, 

n 

Py.(Bd(x")) >^i„(fc,t)q*(l-g)"-* (146) 

t=Q 

Relaxing using ( 1146b . (I1421 l follows. ■ 
The following bound shows that good constant composition 
codes exist. 

Theorem 22 (Achievability, BMS). There exists an 
(n, M, d, e) constant composition code with 

M 



1 



n 
\nq] 



L„(fc, \nq']) 



(147) 

where q and Ln{-,-) are defined in ( |143l l and ( 1144b respec- 
tively. 

Proof: The proof is along the lines of the proof of 
Theorem [21] except that now we let Py>i be equiprobable 
on the set of binary strings of Hamming weight \qn'\ . ■ 
The following asymptotic analysis of P(n, d, e) strengthens 
Theorem [12] 

Theorem 23 (Gaussian approximation, BMS). The minimum 
achievable rate at blocklength n satisfies (182b where 



R{d) = h{p) - h{d) 

= Var[zx(X)] -p(l-p)lof 



.2 1 - P 



(148) 
(149) 
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□ 



and the remainder term in ( |82| l satisfies 



O 



< e 



log n 



^ 1 log n ^ log log n 



O 



(150) 
(151) 



if < d < p, and 



logn 



1 logn 

2 n 



oil 



(l5^ 



ifd = 0. 



□ 
□ 



Proof: The case d = follows immediately from ( I881l. I 

For < d < p, the dispersion (|149l l is easily obtained I 

plugging n = 1 into ( |2TI ). The tightened upper bound f di I 

the remainder (I151t follows via the asymptotic analysis df I 

Theorem |22] shown in Appendix |G] We proceed to show t he I 

converse part, which yields a better term than Theorei E I 
El " □ 
According to the definition of r* in ( 1138b . 
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Fig. 2. Bounds to R{n, d, e) and Gaussian approximation for BMS witli 
p = 2/5, d = 0.11 , e = 10-2. 



for any r < r*, where {Xi} are binary i.i.d. with (1 
In particular, due to (|95] l, (1153b holds for 

Sri 



r + \/np{l - p)Q ^ ( e + — ^ 



= + - P)Q'' (e) + O (1) (155) 

where (1155b follows because in the present case Bn = 



. l-2p+2£* 



have 



which does not depend on n. Using (1137b . — ] q 



M > 



LrJ 



Taking logarithms of both sides of (|156b . we have 
logM 



> log 



- log 



LrJ / - \ [nd\ 
nh[p+ -^^p{l-p)Q-^ {e)) - nh{d) + O (1) 




(158) 

= nh{p) - nh{d) + V^^/p{l-p)h'{p)Q-^ (e) + O (1) 

where (1158b is due to (1359b in Appendix|F] The desired bound 
(fTsTT i follows since h'{p) = log ■ 
Figures [2] and [3] present a numerical comparison of Shan- 
non's achievability bound ( l33T l. the new bounds in (1142b . 
(11371 ) and ( 11351 ) as well as the Gaussian approximation in tionary memoryless sources with alphabet A and symbol eiTor 
([82]) in which we have neglected 6 | i£Sii V xhe achievability rate distortion measure, i.e. d{x'^,y") = ^ J2i=i 1 {xi Ui} 



1000 



Fig. 3. Bounds to R(n, d, e) and Gaussian approximation for BMS with 
p = 2/5, d = 0.11 , e = 10"*. 



VII. Discrete memoryless source 



This section particularizes the bounds in Section HV] to sta- 



bound ^ is very loose and so is Maiton's converse which For convenience, we denote the number of strings within 



is essentially indistinguishable from R{d). The new finite 
blocklength bounds ( 1142b and (|137b are fairly tight unless the 
blocklength is very small. In Fig. |3] obtained with a more 
stringent e, the approximation of Theorem [23] is essentially 
halfway between the converse and achievability bounds. 



Hamming distance k from a given string by 



3=0 



(159) 
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A. Equiprobable DMS (EDMS) 

In this subsection we fix < ci < 1 — < e < 1 and 
assume that all source letters are equiprobable, in which case 
the rate-distortion function is given by [33] 



R{d) = log \A\ - h{d) - d\og{\A\ - 1) 



(160) 



As in the equiprobable binary case. Theorem |7] reduces to 
dOTT l. A stronger converse bound is obtained using Theorem |8] 
in a manner analogous to that of Theorem [Tsl 

Theorem 24 (Converse, EDMS). Any {n, M, d, e) code must 
satisfy: 

e > 1 - Af |^r"^L"rfj (161) 

The following result is a straightforward generalization of 
Theorem [T6l to the non-binary case. 

Theorem 25 (Exact performance of random coding, EDMS). 
The minimal averaged probability that symbol error rate 
exceeds d achieved by random coding with M codewords is 

minE[erf(ri,...,rM)] = (l - l^r^^Ln^j)''' (162) 

Py 

attained by Py equiprobable on A". 

Theorem |25] leads to the following achievability bound. 

Theorem 26 (Achievability, EDMS). There exists an 
(n, M, d, e) code such that 

e< (l-5L„rfj|^r")*' (163) 

The asymptotic analysis of the bounds in ( 11631 ) and ( 1161b 
yields the following tight approximation. 

Theorem 27 (Gaussian approximation, EDMS). The minimum 
achievable rate at blocklength n satisfies 



R{n,d,e)^R{d) + l^^ + o(- 
In \n 



if < d < 1 — -TTT, and 



i?(71,0,6)=log|^|--log-l- 

n 1 — 



(164) 



(165) 



where < o„ < ^ . 

— ''■ — (1 — e)n 

Proof: Appendix iHl ■ 

B. Nonequiprobable DMS 

In this subsection we assume that the source is stationary 
memoryless on an alphabet of m = \A\ letters labeled by 
A — {1, . . . , m}. We assume 



Px{l) > Pxi2) > . . . > Px{m) 



(166) 



and < d < 1 - Px(l), < e < 1. 

Recall that the rate-distortion function is achieved by ll33l 



^ J 1-d-j) " - 

I otherwise 



(167) 



1 — d a = b, a < ra^i 
P*|Y(a|&) ^ It] a^b, a<mn (168) 



where < < 1 is the solution to 

m 

d= ^x^'^) + ^'"^v - 1)'7 

a— ?n^ + l 

rriri — max{a : Px(a) > v} 
The rate-distortion function can be expressed as 



(169) 
(170) 



Rid) = J2 Px{a)ixia) + (1 - d) log(l -d) + (m^ - l)r; log i 



a=l 



(171) 

Note that if < c? < (to— l)Px{m), then to^ = to, ?/ = :;;j3T' 
and ( 11671 ). ( 11681 ) and ( 11711 ) can be simplified. In particular, the 
rate-distortion function on that region is given by 



R{d) = H{X) - h{d) - dlog(TO - 1) 



(172) 



The first result of this section is a particularization of the bound 
in Theorem |7] to the DMS case. 

Theorem 28 (Converse, DMS). For any {n, M, d, e) code, it 
holds that 



e > sup ■ 

7>0 



^jx(^^,d) > logM + 7 



.1=1 




where 



jxia,d) = (l-d)log(l-d)+dlog77 



+ min <^ IX (a), log - 

7] 



(174) 



and 7] is defined in ( 11691 



Proof: Case d = is obvious. For < d < 1 — Px(l), 
differentiating (1171b with respect to d yields 

A^-log-^-' 



(175) 



Plugging ( 1168b and A* into ^17}, one obtains (|174b . ■ 
We adopt the notation of ||34I : 

• type of the string: k = (fci, . . . , km), ki + . . . + km = n 

• probability of a given string of type k: p^ = 

Pxil)''' . . . Pxim)''"' 

• type ordering: j ^ k if and only if > p^ 

• type 1 denotes [n,0, . . . , 0] 

• previous and next types: j — 1 and j + 1, respectively 

• multinomial coefficient: ( ] — — ; 

\kj fci!...fcm! 

The next converse result is a particularization of Theorem 



m 

Theorem 29 (Converse, DMS). Any (n, M, d, e) code must 
satisfy 

k* 



M > 



1 



S 



lnd\ 



where 



k* 



< 1-e 



(176) 



(177) 
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and a £ [0, 1) is the solution to 



1 



= 1 -e 



(178) 



Proof: Consider a binary hypothesis test between the 
n— dimensional source distribution Px" and U, the equiprob- 
able distribution on yl". From Theorem |8] 



M > \A\' 



S 



lnd\ 



(179) 



The calculation of {Px" U) is analogous to the BMS case. 

■ 

The following result guarantees existence 
of a good code with all codewords of type 
t* = ([nP*(l)],...,[nP*K)],0,...,0) where [•] 
denotes rounding off to a neighboring integer so that 

ET=i[r^PYib)]=n holds. 

Theorem 30 (AchievabiUty, DMS). There exists an 
(n, M, d, e) fixed composition code with codewords of type 
t* and 

1 \ M 



k 

m 

L„(k,t*) = [] 



L„(k,t*) (180) 



(181) 



where h ~ [ki, . . . , k^] ranges over all n-types, and ka-types 
ta = {ta,i, ta,mj are given by 



(182) 



where 



5{a,h) 



P^^y{a\b)tl + S{a,b)n 
-9-7—^ — TT E™ 1 1 A,- a ^ h,a < m„ 

tfj(m,, — 1) A^j=m, + l I / 7 — 77 



a — b,a < TOr, 



a > m 



nAa ^ ka- nPx{a), a = 1, . . . , m 



(183) 
(184) 



In ( 1182b , a = l,...,m, b ~ l,...,m,, one/ [•] denotes 
rounding off to a neighboring nonnegative integer so that 



b=l 



(185) 
(186) 
(187) 



and among all possible choices the one that results in the 
largest value for ( II8II 1 is adopted. If no such choice exists, 
L„(k,t^) = 0. 

Proof: We compute an upper bound to dTOl l for the 
specific case of the DMS. Let Py^ be equiprobable on the set 
of TO— ary strings of type t*. To compute the number of strings 
of type t* that are within distortion d from a given string cc" 



of type k, observe that by fixing 2;" we have divided an n- 
string into to bins, the a-th bin corresponding to the letter a 
and having size ka- If ta.b is the number of the letters 6 in a 
sequence y"^ of type t* that fall into a-th bin, the strings a;" 
and y" are within Hamming distance nd from each other as 
long as ( I185l l is satisfied. Therefore, the number of strings of 
type t* that are within Hamming distance nd from a given 
string of type k is bounded by 



En rUMk,n 



(188) 



where the summation in the left side is over all collections 
of fcg - types ta = {taA, ■ ■ ■ ,ta^J, o = 1, ... TO that satisfy 
(I185l l- (ll87b . and inequality ( 1188b is obtained by lower bound- 
ing the sum by the term with ta.b given by ( 11821 ). It follows 
that if a;" has type k. 



Py. (P<i(x")) > 



L„(k,t*) 



(189) 



Relaxing ^ using ( fT89] l. ( fTSOl l follows. ■ 

Remark 10. As n increases, the bound in (|188b becomes 
increasingly tight. This is best understood by checking that 
all strings with ka^b given by ( |182b lie at a Hamming distance 
of approximately nd from some fixed string of type k, and re- 
calling |24| that most of the volume of an n— dimensional ball 
is concentrated near its surface (a similar phenomenon occurs 
in Euclidean spaces as well), so that the largest contribution 
to the sum on the left side of ( 11881 ) comes from the strings 
satisfying (11821 ). 

The following second-order analysis makes use of Theorem 
[T2I and, to strengthen the bounds for the remainder term, of 
Theorems |29] and |30] 

Theorem 31 (Gaussian approximation, DMS). The minimum 
achievable rate at blocklength n, R{n, d, e), satisfies (1821) 
where R{d) is given by (I171I) . and V{d) can be characterized 
parametrically: 



V{d) = Var 



mill \ ix(X),log 



(190) 



where rj depends on d through ( 11691 ), (11701) . Moreover, 
can be replaced by: 



logn 



< 



(to — 1)(to,, — 1) logn loglogn 



O 



IfO<d<{m- l)Px(m), (1190b reduces to 

V{d)=VaT[tx{X)] 
and if d > 0, (184b can be strengthened to 



n 
(191) 



o( - I < 

n . 



Ice 



while if d = 0, 



logn 



llogn^^/1 
2 n \n 



(192) 



(193) 



(194) 



Proof: Using the expression for d— tilted 
information (1174b . we observe that Var [jx(X, d)] = 



15 



Var 



and 



1901 ) follows. The case d^O 
O] leads to (|191t . as we show 



min|«x(X),logi| 
is verified using (ISST l. Trie 
in Appendix U 

When < d < (to - l)Px{m), not only ( fT7T] i and ( fT90l l 
reduce to ( |172| i and ( |192| i respectively, but a tighter converse 
for the i^sii term (fT93] l can be shown. Recall the asymptotic^ — | 
of S\nd\ in ( I388l l (Appendix IHIi. Furthermore, it can be shown 
Il34l that 



k 

E 

i=l 



—= exp < nii — 

\/n \ n 



(195) 



0.12 



0.1 - 



0.08 



0.06 



0.04 



0.02 



for some constant C. Armed with ( 1195b and ( 1388b . we arj | 
ready to proceed to the second-order analysis of (1176) . From! I 
the definition of k* in ( 1177b . 



0.1 



0.2 



0.3 
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0.7 



n m 

- > H{X) + J2 Aa«x(a) 



> e (196) 

for any A with X^aLi — satisfying n(p + A) ^ 
where p = [Px{l), ■ ■ ■ ,Px{m)] (we slightly abused notatioH — I 
here as n(p + A) is not always precisely an n-type; nat u-l— j 
rally, the definition of the type ordering ^ extends to su 
cases). Noting that E[ix{Xi)] = H{X) and Var[«x(Xi)] ^ 
Var [ix(X)], we conclude from the Berry-Esseen CLT (|95t th k — | 
( fT96l ) holds for 1^ 

□ 
□ 
(19'DZI 



a=l 



10 

10' 
10^ 



2 10 



10 
10^ 
10^ 




E= 10" 



where Bn is given by 
of (1176b . we have 

logM 



> log 



Taking logarithms of both side 



0.1 0.2 0.3 0.4 0.5 0.6 0.7 
d 



□ 



log5'L„dj 



(198) 



Fig. 4. Rate-dispersion function (bits) and the blocklengtli )8U required to 
sustain R = l.lR{d) provided that excess-distortion probability is bounded 
by e for DMS with Px 



f- - - -1 
L3' 4' 4' 6 J 



> logEl i ) -log^Lnrfj (199) 

i=l ^ ^ 

> nH{p + A)~ nh{d) - nd \og{m - 1) + 0(1) (200) 

m 

= nH{p) + ^E Aa«x(a) - nh{d) - nd\og{m - 1) + 0(1) 

a=l 

(201) 



where we used ( |388T l and ( fT95T l to obtain (|200] i. and d^OTT i is 
obtained by applying a Taylor series expansion to i7(p + A). 
The desired result in (|193b follows by substituting (1197) in 



VIll. Erased binary memoryless source 

Let 5*" G {0, 1}" be the output of the binary equiprobable 
source, X" be the output of the binary erasure channel with 
erasure rate 5 driven by 5*". The compressor only observes 
X", and the goal is to minimize the bit error rate with respect 
to S*". For d = |, codes with rate approaching the rate- 
distortion function were constructed in [.35] , For | < d < ^, 
the rate-distortion function is given by 



(1201b . applying a Taylor series expansion to — ^ 

in the vicinity of e and noting that i?„ is a finite constant. ■ 
The rate-dispersion function and the blocklength (ISTb re- 
quired to sustain R = l.lR{d) are plotted in Fig. |4] for a 
quaternary source with distribution [-j, -ji -ji Note that ac- 
cording to (ISTb . the blocklength required to approach l.lR{d) 
with a given probability of excess distortion grows rapidly as 

d y dmpiy:- 



R{d) = (1-5) log2-/i 




(202) 



Throughout the section, we assume | < d < | and < e < 1. 
Theorem 32 (Converse, BES). Any (n, M, d, e) code must 
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satisfy 

n 



k=0 

k 



n—k 



1 - Af2-("-'=) 



n — k 
[nd ~ j\ 



n + 



(203) 



Proof: Fix an {n,M,d,e) code (Pz"|jf"j^V"|Z'0- Even 
if the decompressor knows erasure locations, the probability 
that k erased bits are at Hamming distance £ from their 
representation is 



[k d{s'',Y'') ^ e\ x'' = (?...?)] 



(204) 



because given X'' — {7 . . .?), Si's are i.i.d. binary independent 
of 

The probability that n—k nonerased bits lie within 
Hamming distance £ from their representation can be upper 
bounded using Theorem [15] 



[{n - fc)d(5"-'=, < i I X"-'^ = S' 

n ~ k 



n—kl 



(205) 



Since the errors in the erased symbols are independent of the 
errors in the nonerased ones. 



P[d(S'",r") < d] 

n 

P[fc erasures in S"*] 



A;=0 
k 



< 



^ P [fc d{S'', Y'') = j\X'' =? . . .?] 

P [{n - fc)d(5"-'=, <nd- i|X"~*= = S""''] 



E 

fe 



n—k 



Theorem 33 (Achievability, BBS). There exists an (n, M, d, e) 
code such that 



k=0 
k 



M 



(207) 



Proof: Consider the ensemble of codes with M code- 
words drawn i.i.d. from the equiprobable distribution on 
{0, 1}". As discussed in the proof of Theorem |32] the 
distortion in the erased symbols does not depend on the 
codebook and is given by ( 12041 ). The probability that the 
Hamming distance between the nonerased symbols and their 



representation exceeds i, averaged over the code ensemble is 
found as in Theorem (TT] 

P [{n - /^)^^(S•"-^ C(f(X"-'=))) > els'" = X''] 

M 



= 1-2 



-fe 



(208) 

where C(m), m — 1, . . . , Af are i.i.d on {0, 1}". Averaging 
over the erasure channel, we have 

PK^",c(f(x")))) >d] 

n 

= ^ V[k erasures in S""] 

fe=0 

fe 

• ^ P [A; d{S\ C(f = ]\X^ =? . . .?] 

• P [(n - k)d{S''-'', C(f(X"-'=))) > 7irf - j\X''-'' = S"'"*^] 



fe=0 

fe 



<E 



j=o 



[nd - j\ 



M 



(209) 



Since there must exist at least one code whose excess- 
distortion probability is no larger than the average over the 
ensemble, there exists a code satisfying ( I207l i. ■ 

Theorem 34 (Gaussian approximation, BBS). The minimum 
achievable rate at blocklength n satisfies (182b where 



V{d) = 5{l - 5) log 



exp 



4 



A* = = log 



(210) 
(211) 



and the remainder term in (182b satisfies 

^ / 1 \ / log n\ 1 log n log' log n / 1 \ 
0[-\ <e[ < --^+ ^ ^ +0 - (212) 

\n J \ n J 2 n n \^/ 

Proof: Appendix |J] ■ 

Remark 11. It is satisfying to observe that even though 
Theorem [12] is not directly applicable, still V{d) = 
Var [js,x(S, X, c?)], where js,x(s,x, d) is spelled out in (12141) 
below. Indeed, since the rate-distortion function is achieved by 

P*(0) = P*(l) = i and 



b=a 



Pi\Yia\b)= {d- 



2 

d a 



(213) 



where a e {0,1,?} and b E {0,1}, we may adapt ([T7] | to 
obtain 

js,x(S,X,d) 

= «X;Y*(X,0) + A*d(S,0) - A*d (214) 

flog l+cxp(-A*) w.p. 1-5 
= - A*d + <^ A* w.p. I (215) 

[o w.p. I 
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Fig. 6. Bounds to R{n, d, e) and Gaussian approximation for BBS with 
<5 = 0.1, d = 0.1, e = 0.1 



The paiticularization of Theorem |7] to the GMS using 
yields the following result. 

Theorem 35 (Converse, GMS). Any (n, M, d, e) code must 
satisfy 

e > sup{P[g„(Z) > logAf + 7] -exp(-7)} (216) 
.g„(Z) - ^ log ^ + ^^ lege (217) 



Fig. 5. Rate-dispersion function (bits) and the blocklength )81t required to 
sustain R = l.l/?(d) provided that excess-distortion probability is bounded 
by t for BES with erasure rate <5 = 0.1. 



where Z ^ 
freedom). 



2 "° d 2 
X2 (i-^- chi square distributed with n degrees of 



The variance of ( 1215b is (I210l i. 

The rate-distortion and rate-dispersion functions are plotted 
in Fig. |5] Note that as d approaches |, the rate-dispersion 
function grows without limit. This should be expected, because 
for d = |, a code that reconstructs a sequence with vanishingly 
small excess-distortion probability does not exist, as about half 
of the erased bits will always be reconstructed incorrectly, 
regardless of the blocklength. 

The bounds in Theorems [32] and [33] as well as the approxi- 
mation in Theorem |34l are plotted in Fig. |6] The achievability 
and converse bounds are extremely tight. At blocklength 1000, 
the penalty over the rate-distortion function is 9%. 

IX. Gaussian memoryless source 

This section applies Theorems ItI |8] and [TOl to the i.i.d. Gaus- 
sian source with mean-square error distortion, d(a;",y") = 
Sr=i(^* — Ui)"^, and refines the second-order analysis in 
Theorem [12] Throughout the section, it is assumed that 
X., - 7V(0, cr^), < d < and < e < 1. 



The following result can be obtained by an application of 
Theorem [8] to the GMS. 

Theorem 36 (Converse, GMS). Any (n, M, d, e) code must 
satisfy 



where r„(e) is the solution to 

F[Z<n rl{e)] = 1 



(218) 



(219) 



and Z 



Xr. 



Proof Inequality ( 1218b simply states that the minimum 
number of n-dimensional balls of radius Vnd required to 
cover an rt-dimensional ball of radius y/narn{e) cannot be 
smaller than the ratio of their volumes. Since 



(220) 



i=l 



is Xn-distributed, the left side of (12 19b is the probability 
that the source produces a sequence that falls inside B, the 
n-dimensional ball of radius Jnarnit) with center at 0. 



18 



But as follows from the spherical symmetry of the Gaussian 
distribution, B has the smallest volume among all sets in 
M" having probability 1 — e. Since any {n, M, d, e)-code is 
a covering of a set that has total probability of at least 1 — e, 
the result follows. ■ 

Note that the proof of Theorem |36] can be formulated in the 
hypothesis testing language of Theorem [8] by choosing Q to 
be the Lebesgue measure on R". 

The following achievability result can be regarded as the 
rate-distortion counterpart to Shannon's geometric analysis of 
optimal coding for the Gaussian channel 136|. 

Theorem 37 (AchievabiUty, GMS). There exists an 
(n, M, d, e) code with 

e < n [1 — p{n, z)]^^ f^2 (nz) dz (221) 

where f^2 (•) is the Xn probability density function, and 



p{n,z) 



r(§ + i) 



y/nnV ( 
if < z < b^, where 



2 



1 - 



(1 + ^-2^)^ 



b = 



(72 



(222) 

(223) 
(224) 



and p{n, z) = otherwise. 

Proof: We compute an upper bound to ( iTOt for the 
specific case of the GMS. Let Py,. be the uniform distribution 
on the surface of the n-dimensional sphere with center at 
and radius 



ro = 



1-A 
a2 



(225) 



This choice corresponds to a positioning of representation 
points that is optimal in the limit of large n, see Fig. |3a), 
ID, ||25| . Indeed, for large n, most source sequences will be 
concentrated within a thin shell near the surface of the sphere 
of radius -^na-. The center of the sphere of radius Vnd must 
be at distance ro from the origin in order to cover the largest 
area of the surface of the sphere of radius ^/na. 

We proceed to lower-bound Py^iBdix'^)), x" £ W\ 
Observe that Py^ {Bd{x"-)) = if is either too close or too 
far from the origin, that is, if \x'"-\ < ^Jnaa or |a;"| > \Jnob, 
where I • I denotes the Euclidean norm. To treat the more 



interesting case ^Jncra < \x"\ < y/nab, it is convenient to 
introduce the following notation. 



Snir) 



surface area of an n-dimensional 



sphere of radius r; 
• Sn{r,9): surface area of an n-dimensional polar cap of 
radius r and polar angle 9. 

Similar to [SJ, L25J, from Fig. IzJb), 



Sn{r,e) > 



-{rsiney 



(226) 



where the right side of (1226b is the area of an (n — 1)- 
dimensional disc of radius rsin^. So if ^/naa < \x"\ = r < 
y/nab. 



Py^. {Bd[x^)) 



> 



Sn[\x^ie) 

Sn{\x-\) 

r(f + 



VTrnF I 



r 



(227) 

(sin 61)""^ (228) 



where 9 is the angle in Fig. |3b); by the law of cosines 

+ rn - nd 

cos 9 = 



2rro 



(229) 



Finally, by Theorem [TOl there exists an (n, M, d, e) code with 



e<IE[l-Fy.(Bd(X"))] 



M 



E 



(230) 

[1 - PY^{Bd{X''))f' I VHcra < < v^crfo 



(231) 



Since .} is x^-distributed, one obtains ( I221l l by plugging 



sin 9 = 1 — cos 9 into (I228l l and substituting the latter in 
(123X1 1. ■ 
Essentially Theorem |37] evaluates the performance of Shan- 
non's random code with all codewords lying on the surface 
of a sphere contained inside the sphere of radius y/na. The 
following result allows us to bound the performance of a code 
whose codewords lie inside a ball of radius slightly larger than 

Theorem 38 (Rogers EJ - Verger-Gaugry ||38]). //r > 1 and 

n > 2, an n— dimensional sphere of radius r can be covered 
by [M(r)J spheres of radius 1, where M(r) is defined in ( I232l l. 

The first two cases in (I232l l (at the bottom of the page) 
are encompassed by the classical result of Rogers [37] that 
appears not to have been improved since 1963, while the last 
two are due to the recent improvement by Verger-Gaugry [38] . 
An immediate corollary to Theorem |38] is the following: 



' e (n logg n + n logg logg n + 5n) r" 
n (n logg n + n logg logg n + 5n) r" 

IT/ \ _ 74i°gc7/T ^— riVn[(n-l) log^ rri+(n-l) log^ log^ n+i logj n+log^ ^^^] ^ 
__yri[(n-l) log„ rn+(n-l) log^ log^ n+i log^ n+loge J ^^2 ] n 



r 1- 



r > n 

n 

log^ n 

2 < r < 



< r < n 



logg n 

1 < r < 2 



(232) 
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(b) 



Fig. 7. Optimum positioning of the representation sphere (a) and the 
geometry of the excess-distortion probability calculation (b). 



Theorem 39 (Achievability, GMS). For n > 2, there exists 
an (n, M, d, e) code such that 



M < 



where r„(e) is the solution to i219\ . 



(233) 



Proof: Theorem 
no more than M ^-^r„(e^ 



I implies that there exists a code with 
codewords such that all source 
sequences that fall inside B, the n-dimensional ball of radius 
i/nar„(e) with center at 0, are reproduced within distortion 
d. The excess-distortion probability is therefore given by the 
probability that the source produces a sequence that falls 
outside 1. ■ 
Note that Theorem |39] studies the number of balls of radius 
\/nd to cover B that is provably achievable, while the converse 
in Theorem [36] lower bounds the minimum number of balls of 
radius ^/nd required to cover B by the ratio of their volumes. 

Theorem 40 (Gaussian approximation, GMS). The minimum 



achievable rate at blocklength n satisfies 

Rin, d, e) = i log ^ + \ t^Q"' (e) log e + i 
2 d V 2n 

where the remainder term satisfies 
' log n ' 



logn^ 



n 



< 



1 log n log log n ^ f 1 
< t:^- + +0( - 

An n \n 



(234) 

(235) 
(236) 



Proof: We start with the converse part, i.e. 
Since in Theorem [36l^ = ^ ELi ^ J^{0,(^^), 

we apply the Berry-Esseen CLT (Theorem [13) to -^Xf. Each 
-^Xf has mean, second and third central moments equal to 
1, 2 and 8, respectively. Let 



I2V2 



-Q-i(e) + of-' 
n \n , 



(237) 



(238) 



Then by the Berry-Esseen inequality 

F[Z > nf^] > e (239) 

and therefore r„(e) that achieves the equality in (12191 ) must 
satisfy r„(e) > r. Weakening ( 1218b by plugging r instead of 
r„(e) and taking logarithms of both sides therein, one obtains: 

(240) 



log M > - log — 

= -r log — - 



g-i(e)loge + (9(l) (241) 



where ( 1241b is a Taylor approximation of the right side of 
(i240l l. 

The achievability part ( 1236b is proven in Appendix iKl using 
Theorem [37] Theorem [39] leads to the correct rate-dispersion 
term but a weaker remainder term. ■ 

Figures [8] and [9] present a numerical comparison of Shan- 
non's achievability bound ( [33b and the new bounds in ( 1221b . 
( |233b . ( 1218b and (12 16b as well as the Gaussian approximation 
in ([2341 ) in which we took 0(^1^^^ iMli. The achievabil- 
ity bound in ( |233b is tighter than the one in (1221b at shorter 
blocklengths. Unsurprisingly, the converse bound in (1218) is 
quite a bit tighter than the one in (1216b . 

X. Conclusion 

To estimate the minimum rate required to sustain a given 
fidelity at a given blocklength, we have shown new achiev- 
ability and converse bounds, which apply in full generality 
and which are tighter than existing bounds. The tightness of 
these bounds for stationary memoryless sources allowed us 
to obtain a compact closed-form expression that approximates 
the excess rate over the rate-distortion function incurred in the 
nonasymptotic regime (Theorem [T2l i. For those sources and 
unless the blocklength is small, the rate dispersion (along with 
the rate-distortion function) serves to give tight approximations 
to the fundamental fidelity-rate tradeoff. 
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n 



Fig. 8. Bounds to R(n, d, e) and Gaussian approximation for GMS with 




n 



Fig. 9. Bounds to i?(n, d, e) and Gaussian approximation for GMS with 
o- = l, d=i,e = 10"*. 
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Appendix A 
Hypothesis testing 
and almost lossless data compression 

To show ( |64l ). without loss of generality, assume that the 
letters of the alphabet A are labeled 1,2,... in order of 



decreasing probabilities: 

>Px(2) > ... (242) 

Observe that 

M*(0, e) = min {to > 1 : P [X < m] > 1 - e} , (243) 

and the optimal randomized test to decide between Px and U 
is given by 

{1, a < M*{0,e) -1 
a, a = M*(0,e) (244) 
0, a>M*(0,e) + l 

It follows that 

/3i_e(Px,i7) = Ar(0,e)-l + a (245) 
where a G (0, 1] is the solution to 

P[X < M*{0,e) - 1] + aPx(M*(0,e)) = 1 - e, (246) 
hence (l64l i. 

Appendix B 
gaussian approximation analysis 
of almost lossless data compression 

In this appendix we strenghten the remainder term in 
Theorem [H] for d = (cf. ((Mil)- Taking the logarithm of 
( l64l ). we have 

l0g/3i_e(Pjf,f/) 

< logM*(0,e) (247) 

< log(/3i_e(Px,C/) + l) (248) 

= log/3,_.(P.,f/)+log(l + ^-^) (249) 

<log/3.-.(Px,l/) + ^^-^loge (250) 

where in ( I250I I we used log(l + x) < xloge, x > —1. 

Let Px" = Px X • • ■ X Px be the source distribution, and let 
[/" to be the counting measure on A^. Examining the proof of 
Lemma 58 of [27] on the asymptotic behavior of /3i_j(P, Q) 
it is not hard to see that it extends naturally to cr-finite Q's; 
thus if Var[ix(X)] > 0, 

log/3i_,(Px", {/") - nff(X) + V^^VMMX)Ig-i (e) 

-ilogn + 0(l) (251) 

and if Var [zx(X)] = 0, 

log/3i_e(Px", [/") = nH{X) - log (252) 

1 — e 

Letting Px" and U" play the roles of Px and U in ( 12471 ) and 
( |250l l and invoking (EsB and ( |252] i, we obtain ([gill and ( [89l l, 
respectively. 
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Appendix C 
Generalization of Theorems[7]and[T2] 

We show that even if the rate-distortion function is not 
achieved by any output distribution, the definition of d— tilted 
information can be extended appropriately, so that Theorem [T] 
and the converse part of Theorem [12] still hold. 

We use the following general representation of the rate- 
distortion function due to Csiszar [|3]. 

Theorem 41 (Alternative representation of R((i) 13]). Under 
the basic restrictions da}-® of Section \II-B\ for each d > dmin, 
it holds that 

Rx(d) = max {E[a{X)]~ Xd} (253) 

a(x), A 

where the maximization is over a{x) > and A > satisfying 
the constraint 



E [exp {a{X) ~ Xd{X, y)}] < 1 Vy e B 



(254) 



Let (a* (a;), A*) achieve the maximum in ( fT9] l for some 
d > dmin^ and define the d— tilted information in x by 



jx{x, d) = a*{x) - X*d 



(255) 



Note that (fT9T l, the only property of d— tilted information we 
used in the proof of Theorem |7] still holds due to (I254t . thus 
Theorem |7] remains true. 

The proof of the converse part of Theorem [12] generalizes 
immediately upon making the following two observations. 
First, dSTl i is still valid due to (|253t . Second, d-tilted infor- 
mation in ( |255l l still single-letterizes for memoryless sources: 

Lemma 3. Under restrictions Q and ^ in Section \V-B\ (1103b 
holds. 



Proof: Let (a*(x), A*) attain the maximum in (1253b 
for the single-letter distribution Px- It suffices to check that 
{J27=i Q^*(^i)i "-A*) attains the maximum in ( 1253b for Px" = 
Px X ... X Fx. 
As desired. 



E 



.1=1 



nX*d = nRx(d) Mx- (d) (256) 



and we just need to verify the constraints in ( 1254) are satisfied: 



E 



= ]jE[expK(X,)- A*d(X„y)}] 



< 1 Vy" e B" 



(257) 
(258) 



Appendix D 
Proof of Lemma[2] 

Before we prove Lemma [2] let us present some background 
results we will use. For k — 1,2,..., denote 

- , , E\d''{x,Y)exp(-Xd(x,Y))] 



Observe that 

dY,ki^,0) = E [d''{x,Y)] (260) 

(the expectations in (12591 ) and ( 1260b are with respect to the 
unconditional distribution of Y). Denoting by (•)' differentia- 
tion with respect to A > 0, we state the following properties 
whose proofs can be found in |20|. 

A. {E [Ay{X,X*x,y)])' = where A^^,^ -R'^^yid). 

B. E [A'{.(X, A)] < for all A > if E [dyai^, 0)] < oo. 

C. A'yix.X) = ~d + dY,i(x,X). 

D. A'{.{x,X)= [4i(a;,A)-dy,2(2:,A)] (lege)"' <0 
if dy.iix, 0) < oo. 

E. d'yf.{x,X) < if dy^k{x,0) < oo. 

F- dmin\x.Y ~ E [Q!y(X)], where ay{x) — essinf d{x, Y). 
Remark 12. By Properties [A] and iBl 

E[Ay{X,X*xy)] =SUpE[Ay(X,A)] 
A>0 



(261) 



Remark 13. Properties |C] and ID] implv that 

-d< Ay{x,X) < -d + dyi{x,0) (262) 

Therefore, as long as E 0)] < oo, the differentiation 

in Property [Aj can be brought inside the expectation invoking 
the dominated convergence theorem. Keeping this in mind 
while averaging the equation in Property [C] with A = A^^ y 
with respect to Px, we observe that 

E[dyi{X,X*x,y)]^d (263) 

Remark 14. Properties (ITtT i and ([18) of d— tilted information 
imply that the equality in ( 12631 ) holds if A^ y is replaced 
by A^ = -^xid)^ and Y is replaced by Y*' - the Rx{d)- 
achieving random variable. It follows that 

A* = X*x,y^ (264) 

Remark 15. By virtue of Properties ID] and IE] we have 

-dy2{x,0) < A'^{x,X)loge< (265) 

Remark 16. Using ( 12631 ), derivatives of Wx.vid) are conve- 
niently expressed via E [dy^k{x, X^ y)] ; in particular, at any 



J'min\X,Y 



<d<<ax|XF=E[dy,l(^,0))] 



we have 

R'kvid) 



1 



E 



dy.iiX, X*x y) 
lege 



dY,2{X, X\ y) 



-E 



dy.iiX, X*x y) 



> 



(266) 



(267) 



(268) 



(269) 



where (1268b holds by Property |D] and the dominated conver- 
gence theorem due to (1265b as long as E [(iy,2(-'^, 0)] < oo, 
and ( 1269b is by Property [B] 

The proof of Lemma [2] consists of Gaussian approximation 
analysis of the bound in Lemma[T] First, we weaken the bound 
in Lemma [T] by choosing and 7 in ( |72] i in the following 
manner Fix t > 0, and let 7 = ^, Py — Py„ = 
Py, where Y* achieves Mx((i), and choose P^ 



P4 X 



P^„ 
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X • • ■ where is the measure on ^ generated by the 
empirical distribution of € 



(the right side of (|278l l is positive by restriction dm) in Section 
IV-BI ) and denote 



1 " 

rj — ^ 



3A 

A = -Mx.Y* ( + — 



(270) 



Since the distortion measure is separable, for any A > we 
have 

n 

Ai'.*(a;,An) =^AY(a;„A) (271) 
SO by Lemma [T] for all 



m" = e[|a;;.(x,a*)|] 

3 A 

5^— sup y* (^^ + ^) 
2 |e|<i^ 



(279) 

(280) 
(281) 
(282) 



^ > '^min|Jf",Y"* 



(272) 



it holds that 



Py.(Prf(:E")) >exp (^-^AY(a;„A(a;"))-A(a;")Tj 



where we denoted 



(273) 



(274) 



_ 1 " 

= - V sup |A"(x,,A^ + 6')|loge (283) 

n 

= - y inf |A"(x,, A* + 61)1 lege (284) 
n ^ — ' \e\<s 

i—l 

We say that a;" e P„ if it meets the following conditions: 

1 " 

- Q;y* (a;i) < rfmin|X,Y* + A (285) 

i=l 

1 " - 

- V dY-.liXi, 0) > d,„ax|X,Y* - A 

i— 1 
1 " - 

- VdYM(x^,A) > d+ A 
n ^-^ 

i—l 
1 " - 

- > dY*i(a;i,A)<c?— A 



(286) 
(287) 
(288) 



i=l 



1 " - 

- dY\3 (x„ 0) < E [Jy^.s (X, 0)] + A (289) 



(A(x") depends on x" through the distribution of X in ( |270l l). 
and P|„ = P^ X . . . X P^ , where P2*|x achieves Rx.y*(^)- 
The probability appearing in (1273) can be lower bounded by 
the following lemma. 

Lemma 4. Assume that restrictions dili-divTi in Section \V-B\ 
hold. Then, there exist So,nQ > such that for all S < Sq, 
n > hq, there exist a set F„ C yl" and constants r, Ci , Ki > 
such that 



Z(a;")< ^loge 



(290) 
(291) 



Let us first show that ( I277l i holds with 5 given by ( I282l i for 
all satisfying the conditions ( l285T l- (l288T l. From (|287] | and 



-j^ n 1 ^ 

- VrfYM(a;», A) < d < - V JYM(a;,,A) (292) 
n -"^ — ' n ^ — ' 



n 



(275) 



i=l " 1=1 

On the other hand, from (1263b we have 



and for all x„ € P„ 



1 " 

rf^ _ V<iYM(a;.,A(x")) 

T7 ^ ^ 



(293) 



nd - T < ^ d{xi,Z*) < nd\x"- = 



> (276) 



Therefore, since the right side of (I293t is decreasing (Property 



A < A(a;") < A 



(294) 



|A(a;")-A*| <(5 (277) 

where A* = -M^(d). 

Proof: The reasoning is similar the proof of ||20] (4.6)]. 

Fix 



< A < i min {d - d,„in|x,Y* , dmax|x,Y* - d} (278) 



Finally, an application Taylor's theorem to ( |279l l and ( |280t 
using ( |264t expands ( 12941 ) as 

- ^Ky. (d) + X*< A(x") < A* + ^M;^,y* (d) (295) 



for some d e [d,d+ de [d,d- Note that ( 12781 ). 
(1285) and (1286b ensure that 



2A < d < d,„ax|x,Y* - 2A (296) 



'-^m 1 



min|X,Y* 
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SO the derivatives in (|295l l exist and are positive by Remark 
[16] Therefore (|277^ holds with 5 given by ( |282] i. 

We are now ready to show that as long as A (and, therefore, 
6) is small enough, there exists a Ki > such that (1275b 
holds. Holder's inequality and assumption in Section 
IV-BI imply that the third moments of the random variables 
involved in conditions ( I287b -( l289l l are finite. By the Berry- 
Esseen inequality, the probability of violating these conditions 
is O (^"^y To bound the probability of violating conditions 
( 12901 ) and ( 12911 ). observe that since Ay* (X, A) is dominated by 
integrable functions due to (1265b , we have by Fatou's lemma 
and continuity of Ay* (x, •) 



u" < liminf E 



< limsupl 



inf \AZJX,X* + d')\ 
\e'\<5 

sup |a;;.(x,a* + 6*') 

\e'\<s 



(297) 

(298) 
(299) 



Therefore, if 6 is small enough, 

"ill" — ^ii" 

^ log e < E < E < log e (300) 

The third absolute moments of y(A'") and V1(X") are finite 
by Holder's inequality, (1265b and assumption in Section 
IV-BI Thus, the probability of violating conditions (|290b and 
( 12911 ) is also O (^-^^ Now, ( 1275b follows via the union 
bound. 

To complete the proof of Lemma H] it remains to show 
( 12761 ). Toward this end, observe, recalling Properties iDland lEl 
that the corresponding moments in the Berry-Esseen theorem 
are given by 



, n 

^(x") = -yEUx„Z'^)|X = 
n ^ — ' L 

- - VdYM(^.,A(a:")) 



(301) 



_ (302) 

1=1 

= d (303) 

V{x") = if] [dV,2(^„A(:E")) -4*^i(a;„A(a;"))] 
1=1 

(304) 

n 

= — y A"(x„A(x"))loge (305) 



r(a;") = 
1 " 

-Ye 



(306) 



d(a;,,Z*)-E[d(a;„Z*) | X 

n 

<^J2^[\d{x,,Z*f\X = x, 

g ^ _ 

n ^-^ 

8 - 

< - V dY^3(x^,0) 

77 r J 



X = Xi 



(307) 

(308) 
(309) 



Due to ( l277b . ( l290l ) and (|29B, ^ log e < < 2^ log e 

as long as a;" e i^„. Furthermore, 



r(a;") < 8E [dY*,3(X,0)] + 8A 



(310) 



for such due to ( 1289b . Therefore, by the Berry-Esseen 
inequality we have for all x" e F„: 



> 



> 



> 



nd — T < y^ (i(a;i, Z*) < ndjA" 
'2^ Jo 



e ^ du 



g 2,il/(x„) 



^/Stt^" log e 



e i°s = — 2i3 



12T(a;") 1 
yi(a;") ^A^ 
12T(x")\ 1 

1 

7e 



(311) 
(312) 
(313) 
(314) 



where B — 96^/2 ^^'''^*'''^'^ '^^^^^ . The proof is complete upon 

(p" loge)2 

observing that as long as n is large enough, we can always 
choose r > so that ( 1314b is positive. ■ 
To upper-bound X]"=i ^Y(a;i, A(x")) appearing in ( |273b , 
we invoke the following result. 



Lemma 5. Assume that restrictions dili-dTvb in Section \V-B\ 
hold. There exist constants no, K2 > such that for n > hq, 



Ay* (^,, A(X")) < Y Ay* X*) + C2 log 1 



> 1 - 



where 



i=l 
El 



C2 



Var [Ay. (X, A*)] 
E[|A(;,(X,A*)|]loge 



(315) 



(316) 



Proof: Using ( I277l t, we have for all Xn ^ 

n 

y [AY*(x„A(a;"))-AY*(a:„A*)] 



i=l 



= sup y [AY*(a;„A* + 0)-AY.(:E„A*)] (317) 
\s\<st~{ 

n „2 " 

- sup ^y AV,(a;„A*) + — yA;;.(x„A*+^„) (318) 



< sup 6l5'(a;") - — S"'(a;") 



< 



|e|<5 
25'" 



(319) 
(320) 



where 



. ( |3T7] | is due to ( 12611 1; 

• ( 1318b holds for some |Cn| < ^ by Taylor's theorem; 
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in ( |319l l we denoted 



(321) 



5"'(a;") = -V inf lA^;, (x,, A* + 6l')| (322) 
^\e'\<s 



and used Property |Dj 
• in (13201) we maximized the quadratic equation in (|319t 
with respect to 0. 

Note that the reasoning leading to (|320| i is due to ET\ proof of 
Theorem 3]. We now proceed to upper-bound the ratio in the 
right side of (1320b . Since E [(iY*,i(X, 0)] < oo by assumption 
dry] ) in Section IV-BI the differentiation in Property |A] can be 
brought inside the expectation by ( |262t and the dominated 
convergence theorem, so 



E 



-5"(X") 
n 



= E[A;*(X, A*)] = 



(323) 



Denote 



y = Var [AV*(X, A*)] 

T' = E \\A'y, (X, A*) - E [A'y, (X, A*) 



(324) 
(325) 



If V' = 0, there is nothing to prove as that means = 
a.s. Otherwise, since (I262l l with Holder's inequality and 
assumption (jlvll in Section IV-BI guarantee that T' is finite, the 
Berry-Esseen inequality (|95T l implies 



< 



> V'nlog^n 



< 



< 



i2r' 



12T' 



TT logg 2 I y/n 



In ( 13271 ). we used 



(326) 
(327) 

(328) 
(329) 



Q{f) < 



2TTt 



■e 2 



(330) 



inequality ( |95T l, 



s"'(a:") < 



< 



< 



6r" 



6r" 



8F" 



TT/i' 



e 32V' 



//2 



where in ( |331| l we used ( I330I I. Finally, denoting 

n / n \ 

= E ^^Y* (x., A(x")) - J2 Ay* (x„ a*) 

and letting Gn be the set of a;" e yl" satisfying both 

{S'(x")f < V'nlog^n 



1 

(331) 
(332) 

(333) 

(334) 

(335) 
(336) 



we see from (|275l l, (13291 ), (13331 ) applying elementary proba- 
bility rules that 



+ 



< 



>[ff(X")>C2logn] 



5(X")>C2logn, 5(^")> 



2S"{X^) 
(S"(X"))^ 



2S'"(X") 



2S"'(X") 

(5"(X"))^ 
2S'"(X") 



> C2 log n 



+ 



(337) 
(338) 



> C2logn, AT" e Gn 



< 



^2 , ^2 , ^ 



_2. 



_2. 



4^ (339) 



(340) 



We conclude that (13151 ) holds for n > uq with K2 = Ki 

K'2 + K'l 



To apply Lemmas |4] and |5] to ( 12731 ). note that ( 12721 ) (and 
hence (1273] )) holds for x" e F„ due to (12961 ). Weakening (12731 ) 
using Lemmas |4] and |5] and the union bound we conclude that 
Lemma ID holds with 



C^i+C2 

K^Ki + K2 
c^{\* + 5)T-\ogCi 



(341) 
(342) 
(343) 



and ( 13281 ) obviously holds for n > 2. To treat 5"(Ar"), observe 
that S"'(a;") = (log e)"^ (see (EH), so as before, 

the variance V" and the third absolute moment T" of = 
inf|e,|<5|A;;,(X„A* +6'')| are finite, and E [Z^] > ^ hy 
(l300l ). where /i" > is defined in ( |28T] ). If = 0, we 
have Z" > ^ almost surely. Otherwise, by the Berry-Esseen 



Appendix E 
Proof of Theorem[T4] 

In this appendix, we show that dl 171 ) follows from ( [82] ). 
Fix a point {dao , Rao ) on the rate-distortion curve such that 
dao e {d,d). Let c?„ — D{n, Rco,£), and let a be the acute 



□ 

angle between the tangent to the R{d) curve at d = dn aiJd — I 
the d axis (see Fig. [TO] i. We are interested in the differen cj— j 
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dn — doo- Since LIOJ 

lim D{n,R,e) = D{R), 

there exists a S > Q such that for large enough n, 

dn e (doo) = [doo - 6,doo + S] C {d, d) 
For such dn, 

Ii{dn) — Ro 



(34 



□ 



Ra 



(345 



□ 




< 



< 



tan a„ 
R{n,dn,e) - i?(d„) 



mindgB5(d^) R'{d) 



01^ 

/n 



(346) 
(347) 
(348) 



where 



• ( I346l l is by convexity of 

• ( I347l i follows by substituting i?(n,d„,e) = i?oo and 
tan a „ = |i?'(d„)|; 

• ( |348l l follows by Theorem [12] Note that we are allowed 
to plug dn into ( [82] l because the remainder in ( |82t can 
be uniformly bounded over all d from the compact set 
^s{doo) Gust swap Bn in (1105b for the maximum of i?„'s 
over Ms (doo), and similarly swap c,K,Bn in ( 1112b and 
dl 13b for the corresponding maxima); thus (|82t holds not 
only for a fixed d but also for any sequence dn G 85(^00)- 

It remains to refine (1348b to show (II 17b . Write 



V{dn)^V{d^)+0{^ 



R{dn) = i?(dco) + R {doo){dn - ^oo) + O - 



(349) 
(350) 



R{dn) 



V{dn) 



+ R!{doo){dn - doo) + 



log n 



^R{dn) + \I^^^Q-He) 



+ R\doo){dn-doo)+9(^ 



log n 



where 



. ( [349] l and (|350] | follow by Taylor's theorem and (1348^ 
using finiteness of V'{d) and R"{d) for all d G Ms{doo)', 
• (1351b expands i?oo = d„, e) using 
. (|352] | invokes dsj^l l. 



Rearranging (1352) . we obtain the desired approximation (|117b 
for the difference d„ — doo- 



Fig. 10. Estimating d„ 



doo from R(n, d, e) — R{d). 



Appendix F 
Proof of TheoremITs] 

From the Stirhng approximation, it follows that (e.g. fi39l ) 



8fc(n - k) 



exp <nh[ — 



< 



< 



2Trk(n — k) 
In view of the inequality 



exp < nh 



n 
k- j 



< 



we can write 



< 



< 



E 

n\ n - 



-k 



k I n — 2k 



(353) 
(354) 

(355) 

(356) 
(357) 

(358) 



where (1358b holds as long as the series converges, i.e. as long 
as 2k < n. Furthermore, combining (1356b and (1358b with 
(351) Stirling's approximation (1353b and (1354b . we conclude that 
for any < a < i. 



Ids 



n 
[naj 



nh (a) log n + O (1) 



(359) 



(352) Taking logarithms in (1124b and letting logAf = nR for any 
R > R{n, d, e), we obtain 



log(l - e) < n{R - log 2) + log 
< n{R - log 2 + h{d)) 



n 
[nd\ 
1 

" 2 



(360) 

logn + (9(l) (361) 



Since ( 1361b holds for any R > R{n, d, e), we conclude that 

(362) 



1 loff n 
Rin,d,e)>R{d) + -- ^ 



2 n 



0|i 
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Similarly, Corollary [17] implies that there exists an 
(exp(ni?), d, e) code with 



log e < exp (nR) log 1 



lnd\ 



< — exp (nR) 



loge 



(363) 



(364) 



where we used log(l + x) < a; loge, x > —1. Taking the 
logarithm of the negative of both sides in ( |364t , we have 

log log ->n{R~ log 2) + log ( " ) + log log e (365) 
e \ lnd\ I 

= n{R - log 2 + h(d)) - i log n + O (1) , (366) 

where (I366l l follows from ( 1359) . Therefore, 

i?(n,d,e)<i?(d) + ii^ + of-^ (367) 
2 n \n ) 

The case d = follows directly from (|89). Alternatively, it 
can be easily checked by substituting <| ^ ) = 1 in the analysis 
above. 



Appendix G 
Gaussian approximation 

OF THE BOUND IN THEOREm[22] 

By analyzing the asymptotic behavior of (1147b . we prove 
that 

i?(n, d, e) < h{p) - h{d) + J'^Q-^ (e) 

V n 



llogn _^ log log 71 f}_ 
2 n n \n 



(368) 



where V{d) is as in (|149l l, thereby showing that a constant 
composition code that attains the rate-dispersion function 
exists. Letting M = exp (nR) and using (1 — x)^^ < e^*^^ 
in ( I147l i. we can guarantee existence of an (n, M, d, e') code 
with 

e' < ^ _p)«-fce-(M)"'^"('='M)oxp(ni?) 

(369) 

In what follows we will show that one can choose an R 
satisfying the right side of ( 1368b so that the right side of 
(1369b is upper bounded by e when n is large enough. Letting 
k = np + nA, t ^ \nql to = and using 

Stirling's formula (1353) , it is an algebraic exercise to show 
that there exist positive constants 6 and C such that for all 

. 1 1 [to) [t - to) " U 



t\/n-t 
toy \k — to 



C 



> ^exp{ng(A)} 



(370) 
(371) 



where 

g(A) = hip + A) -qh(^d-^^-il- q)h {d + 



It follows that 

-1 



/ n 



C 



Lninp + nA, \qn\) > ^exp{-ng(A)} (372) 



Wnqy V"- 
whenever Ln{k, {qn]) is nonzero, that is, whenever \nq'\ — 
nd < k < Inql + nd, and 5(A) = otherwise. 

Applying a Taylor series expansion in the vicinity of A = 
to .g(A), we get 

g{A) = h{p) - h{d) + h'{p)A + O (A2) (373) 

Since g{A) is continuously differentiable with ^'(O) = 
h'{p) > 0, there exist constants b,b > such that g{A) is 
monotonically increasing on (—6, b) and ( 13711 ) holds. Let 



n 

2Bn 



V{d) 1 , 
_ n V 27rn b 
1 - 2p + 2p2 



Bn 



1 log n 

2 n 



— log 

n 



loe 



2C 



(374) 
(375) 
(376) 
(377) 



Using ( 1373b and applying a Taylor series expansion to (•), 
it is easy to see that R in ( 1377b can be rewritten as the right 
side of (1368b . Splitting the sum in ( 13691) into three sums and 
upper bounding each of them separately, we have 



A;=0 

E" 

/c=0 



\np+nbn\ n 
k— \_7ip—nb\ 4-1 k— \ np-\-7ibn\ +1 



(378) 



< 



i=l 
Inp+nbn ] 



Xi < np — nb 

p'ii-pY 



E 

k— Inp—nb] +1 



> np + nbn 



< 



27rri b 



'2V(d) 



exp j^ni?— ^^^) } 



(379) 



^ (380) 



(381) 



where {Xi} are i.i.d. Bernoulli random variables with bias p. 
The first and third probabilities in the right side of (1379) are 
bounded using the Berry-Esseen bound ( |95] l and (1330) . while 
the second probability is bounded using the monotonicity 
of g{A) in {—b, bn] for large enough n, in which case the 
minimum difference between R and g{A) in (—6, 6„) is 



2 71 



2C 



Appendix H 
Proof of Theorem|27] 

In order to study the asymptotics of (|161b and (1163) , we 
need to analyze the asymptotic behavior of S^nd] which can 



27 



be carried out similarly to the binary case. Recalling the where (I395l l follows from (1388) . Therefore 
inequaUty ( I355l l. we have ^ ^ ^ ^ 

Sk=J2(''.){m-iy (382) 



k 

/ n \ ^ 

< 



R(n,d,e) < R(d) + + [ - ] (396) 
2 n V 71 ' 



j J The case d = follows directly from ( |89t . or can be obtained 

by observing that 5o = 1 in the analysis above. 



4:)("-')''g( („_tK„-i) y "*^> 



Appendix I 
Gaussian approximation 

OF THE BOUND IN THEOREm[301 

j=o "-JV"- Using Theorem [30l we show that 



n \ , ^ , I, n 



- k 



^'''^ i?(n,ci,e)<i?(d) + ^^g-(.) (397) 

where ( I385l l holds as long as the series converges, i.e. as long (m — l)(m^ — 1) logri log log n ^ / 1 

as ^ < Using + 2 ~ + + ^ U 

> (^"^ (■ \\^ G86) where is defined in ( fTTOl ). and V^frf) is as in ( |190l l. Similar 

^ ~ \k) to the binary case, we express i„(k, t*) in terms of the 

, 1 • 1- ' ^ ogoi J x nrA\ rate-distortion function. Observe that whenever L„(k, t*) is 

and applymg Stirlmg s approximation (13531 1 and (1354) . we ri\ ■, > 

have for < d < nonzero, 

\— 1 /\— l"i/7N 



n(t) (^'^«) 



log 5L„dj = log ^^"^j j+nd log(m - 1) + 0(1) (387) l^t y y 

= + ndlog(m - 1) - 1 log n + 0(1) (388) ^ Q ' iS (k ) ^^'^'^^ 

Taking logarithms in (II6II 1 and letting log A/ = ni? for any ""^ 

i? > d, e), we obtain where k,, = {ti^t, • • • , im,fc)- It can be shown ||34l that for 

n large enough, there exist positive constants Ci , C2 such 

log(l - e) < - log m) + log L„dj (389) that ( |400l l and ( |40T] l at the bottom of the page hold for small 

< n(R — logTO + h{d) + c?log(m — 1)) enough | A|, where A ~ (Ai, . . . , A^). A simple calculation 

1 using V™ 1 Aa ~ reveals that 
-ilogn + 0(l) (390) 

2 m rur, _^ 

Since ( |390l ) holds for any i? > d, e), we conclude that X] 51 p* ( \h\ 

i?(n,d,6)>i?(d) + -^ + - (391) , 1 ^ . , 1 

Similarly, Theorem |26] implies that there exists an "=1 a-m^+i 

(exp(ni?), d, e) code with so invoking (|400| | and (|40T] | one can write 



loge<exp(ni?)log(^l--^j (392) (n\ > exp {-n,(A)} 



^k 

<-exp(ni?)%iloge (393) (403) 

where C is a constant, and (7(A) is a twice differentiable 
where we used log(l + .t) < xloge, x > -1. Taking the function that satisfies 
logarithm of the negative of both sides of (|393l l, we have m 

1 5(A) = i?(d) + ^A,z;(a) + 0(|A|2) (404) 

loglog- > logm) + log5L„dj +logloge (394) ^=1 

= n(i?-logm + /i(d))-ilogn + 0(l), (395) v{a) = mm l^ixi a), log (405) 



^ < Cin-^ exp n |i7(X) + E log + O {\Af) | 

> C2n-^ expn I PY*(6)iJ (X|Y* = 6) + E ^) l^S p. ] i.x + O (| Ap) I 

I a=l ^X\yWO) J 



(400) 
(401) 
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Similar to the BMS case, 5(A) is monotonic in 

jy^^^i ^a'v{a) G i~^k,b) for some constants b,b > Q 
independent of n. Let 



V n 



e« = e - 
R = 



2Bn 



/n 



/n 



max 

(rn — — 1) log J 



2nn b 
5(A) 



_g "'2V(ci) 



1 , f loge " 



(406) 
(407) 

(408) 



where i3„ is the finite constant defined in ( |99] l. Using (I404l i 
and applying a Taylor series expansion to Q^^ (•), it is easy 
to see that R in ( |408t can be rewritten as the right side of 
(|397] l. Further, we use ni? = logAf and (1 - x)*^ < e"*^^ 
to weaken the right side of dlSOb to obtain 

\^ { Vn(p+A) -(;i)"'L„(n(p+A),t*)cxp(n_R) 



E 



E 



E 



A: A: A: 

(409) 



< 



+ 



sup I 

A: 

J:'^^^ Aav{a)e{-b,b„) 



-Cn 2 cxp n{fl-g(A)} 



5;]«(Xfc)>E[i;(X)]+6„ 



.k=l 



(410) 



1 



^ ^ + \hr^k"^ + ^ + + ^ ^411) 



where Px^.ia) = -Px(a). The first and third probabilities in 
(|409l l are bounded using the Berry-Esseen bound (|95t and 
(I33OI 1. The middle probability is bounded by observing that the 
difference between R and g(A) in X^^i ^av{a) E (—6, 6„) 



is at least 



(m — l)(m,, — 1) logn 



+ 



Appendix J 
Proof of Theorem[341 

Converse: The proof of the converse part follows the 
Gaussian approximation analysis of the converse bound in 
Theorem |32] Let j ~ + nAi and k = nS — nlS.2- Using 
Stirling's approximation for the binomial sum (I359K after 
applying a Taylor series expansion we have 



2" (n— A:) 



n ^ k 
[nd - j\ 



C{A) 



cxp{-ng(Ai,A2)} (412) 



where C(A) is such that there exist positive constants C_, C, 
5 such that C < C(A) < C for all |A| < ^, and the twice 



differentiable function ^(Ai, A2) can be written as 

g(Ai, A2) = Rid) + ai Ai + a2A2 + O (| Ap) 

1 - d- I 
ai = log — : = A* 



d-l 



a2 ^ log ■ 



2 1 



= log- 



(413) 
(414) 

(415) 



1-6 °l + exp(-A*) 

It follows from (1413b that g(Ai, A2) is increasing in aiAi + 
a2A2 S (—6, b) for some constants b,b > (obviously, we 
can choose b, b small enough in order for C_< C{A.) < C to 
hold). In the sequel, we will represent the probabilities in the 
rig ht side of ( |203] | via a sequence of i.i.d. random variables 
Zi, . . . , Zn with common distribution 



Z = 



w.p. f 
w.p. 1 — S 
otherwise 



Note that 



;[Z] = ^ + a2(l-5) 



(416) 



(417) 



Var [Z] = 6(1 - 5) (02 - y ) ' + ^ = V{d) (418) 

and the third central moment of Z is finite, so that i?„ in (|99l ) 
is a finite constant. Let 



(419) 



C 



2B,, 



V{d) 1 



R = 



/n I y/n V 27m b 

ff(Ai,A2) 



mm 

Ai, A2: 

fc„<aiAi+a2A2<b 

0(bl 



R{d) + b„ 



'2VW (420) 
(421) 

(422) 



With M = exp{nR), since R < g(Ai,A2) for all aiAi + 
a2A2 e [bn,b], for such (Ai, A2) it holds that 



C 

1- ^Mexp{-n5(Ai,A2)} 

Denoting the random variables 
1 " 

N{x) = -^1{Z,; ^x} 



> 1 - 



7^ 



G„ = n5(7V(ai)--,JV(a2) 



1 



(423) 



(424) 



(425) 



and using (14121) to express the probability in the right side of 
( 1203b in terms of Zi, . . . , Zn, we conclude that the excess- 
distortion probability is lower bounded by 

^ +1 

exp{logM - Gn} 




bn < E ^» " < ^ 

2B„ 



Mi, 

27rn 



(426) 

(427) 
(428) 
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where ( |426l l follows from (1423) , and (|427| i follows from the 
Berry-Esseen inequahty (|95t and (|3301 l. and (|428l l is equivalent 

to i^M . m 

Achievability: We now proceed to the Gaussian approx- 
imation analysis of the achievability bound in Theorem |33] 
Let 



log M = n 



n 



mm 

Ai, A2: 
fc„<aiAi+a2A2<b 



y{d) 1 

3(Ai,A2) 



-logn 

nR{d) - 
— log n 



log 



2C 



log logn + 0(l) 



(429) 
(430) 

(431) 
(432) 



where ^(Ai, A2) is defined in ( 14121 ). and ( |432| l follows from 
(14131 ) and a Taylor series expansion of (•). Using (I412l i 
and (1 — x)^^ < e"*^^ to weaken the right side of (|207| i 
and expressing the resulting probability in terms of i.i.d. 
random variables Zi, . . . , Z„ with common distribution (I416l l, 
we conclude that the excess-distortion probability is upper 
bounded by (recall notation ( I425l l) 



E 



< 



oxp{log M-G„} 



.1=1 



> nE [Z] + nbn 



^ < nE [Z] - nb 



E 



{n 
b<Y, 



, Bn B„ 

Jn Jn 



2TTn b 



Zi - nE [Z] < nb„ 

(433) 

(434) 
(435) 



2V(d) ^ 



1 



where the probabilities are upper bounded by the Berry-Esseen 
inequality (l95l l and ( 13301 ). and the expectation is bounded using 
the fact that in < aiAi + a2A2 < the minimum differ 
ence between log M and n g{Ai, A2) is ^ log n+log " 
Finally, ( |435l ) is just ( |430l ). 



2C 



Appendix K 
Gaussian approximation 

OF THE BOUND IN THEOREm[37] 

Using Theorem [37l we show that R{n, d, e) does not exceed 
the right-hand side of ( |234t with the remainder satisfying 
(12361 ). Since the excess-distortion probability in (122 U depends 

for simplicity we let = 1. 
Using inequality (1 — x)^^ < e^*^^, the right side of (1221b 



on cr^ only through the ratio 
Using inequality (1 — x)^ 
can be upper bounded by 



g-p(„,^)oxp(ni?)j^^ (nz)n d^, 



(436) 



From Stirling's approximation for the Gamma function 



it follows that 



r(f + 1) 



K 1 



1 



^ 1+0 i 



(438) 



which is clearly lower bounded by ^^^n ^^^"^ " large 
enough. This implies that for all < z < 6^ and all n large 
enough 

p(n,z)> -^expf(n-l)log(l-5(z))H (439) 
2 Jim L J 



where 



{l + z-2df 



(440) 



4(1 - d)z 

It is easy to check that g{z) attains its global minimum at z 
[1 — 2(i]+ and is monotonically increasing for z > [1 — 2d\ 
Let 



V n 



2 Br, 



1 



1 



m 'id\/Trn 



(441) 
(442) 



- log (1 - .9(1 + &„)) + + - log (V^log, n) 

(443) 

where i?„ = 12^/2. Using a Taylor series expansion, it is not 
hard to check that R in (14431 ) can be written as the right side 
of (12341 ). So, the theorem will be proven if we show that with 
R in ( 14431) , (14361 ) is upper bounded by e for n sufficiently 
large. 

Toward this end, we split the integral in (14361 ) into three 
integrals and upper bound each separately: 

r.[l-2d]+ /.l+6„ poo 

+ + (444) 

'0 "'0 J[l-2d]+ Jl+bn 

The first and the third integrals can be upper bounded using 
the Berry-Esseen inequality (l95l ) and (13301 ): 

n[l-2d]+ r n 

Y,Xf<n{l-2d) 



< 



.4=1 



B„ 



l+b„ 



< 



< 



1 



n Ad^JwrL 

n 

J2x!>n{l + b„) 



.4=1 
Br, 



(445) 
(446) 
(447) 
(448) 



Finally, the second integral is upper bounded by because 
by the monotonicity of g{z), 

g-p(n,2:) oxp(n_R) ^ cxp{ i log n+log( log^ n)} (^^^g^ 

1 

for all [1 - 2d]+ < z < 1 + 6„. 



(450) 
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